Aggregate SQL column values by time period - sql

I have some numerical data that comes in every 5 minutes (i.e. 288 values per day, and quite a few days worth of data). I need to write a query that can return the sums of all values for each day. So currently the table looks like this:
03/30/2010 00:01:00 -- 553
03/30/2010 00:06:00 -- 558
03/30/2010 00:11:00 -- 565
03/30/2010 00:16:00 -- 565
03/30/2010 00:21:00 -- 558
03/30/2010 00:26:00 -- 566
03/30/2010 00:31:00 -- 553
...
And this goes on for 'x' number of days, I'd like the query to return 'x' number of rows, each of which containing the sum of all the values on each day. Something like this:
03/30/2010 -- <sum>
03/31/2010 -- <sum>
04/01/2010 -- <sum>
The query will go inside a Dundas webpart, so unfortunately I can't write custom user functions to assist it. All the logic needs to be in just the one big query. Any help would be appreciated, thanks. I'm trying to get it to work using GROUP BY and DATEPART at the moment, not sure if it's the right way to go about it.

U can use CAST to date type
SELECT [ENTRY_DATE],CAST([ENTRY_DATE] AS date) AS 'date'
FROM [PROFIT_LIST]
Now you can group by according to this.
SELECT CAST([ENTRY_DATE] AS date) AS 'date',SUM(PROFIT)
FROM [PROFIT_LIST]
GROUP BY CAST([ENTRY_DATE] AS date) AS 'date'

Here's a nice trick. If you cast a SQL DATETIME to a FLOAT it gives you the date as days.fractionofday
Therefore if you floor that, and turn it back to a DATETIME it gives you minight on the given date.
CAST(FLOOR(CAST(MyDateTime AS FLOAT)) AS DATETIME)
Therefore, my favourite way of doing this is.
select
CAST(FLOOR(CAST(OrderDate AS FLOAT)) AS DATETIME)
, sum(taxamt) as Amount
from
Sales.SalesOrderHeader
group by
CAST(FLOOR(CAST(OrderDate AS FLOAT)) AS DATETIME)
I have no idea if that is more/less eficient than any previous correct answers.

I do not see how you could use DATEPART for this since it cannot return only the date part of a DATETIME value (correct me if I am mistaken). What does work is use DATEADD to "null" the time and group by a value of the form YYYY-MM-DD 00:00:00.000.
The following query works against the Adventure Works database in case you happen to have it (tested on SQL Server 2005). Other than using DATEADD it is very similar to #OMG Ponies suggestion:
select
dateadd(dd, 0, datediff(dd, 0, OrderDate)) as SaleDay
, sum(taxamt) as Amount
from
Sales.SalesOrderHeader
group by
dateadd(dd, 0, datediff(dd, 0, OrderDate))
order by
SaleDay
The idea of dateadd(dd, 0, datediff(dd, 0, OrderDate)) is to first get the "number of days passed from the beginning of time until your date" (the datediff-part) and then add this number of days to "the beginning of time". Which gives you the "start of your day". I hope this is understandable :)

Related

Data aggregation by sliding time periods

[Query and question edited and fixed thanks to comments from #Gordon Linoff and #shawnt00]
I recently inherited a SQL query that calculates the number of some events in time windows of 30 days from a log database. It uses a CTE (Common Table Expression) to generate the 30 days ranges since '2019-01-01' to now. And then it counts the cases in those 30/60/90 days intervals. I am not sure this is the best method. All I know is that it takes a long time to run and I do not understand 100% how exactly it works. So I am trying to rebuild it in an efficient way (maybe as it is now is the most efficient way, I do not know).
I have several questions:
One of the things I notice is that instead of using DATEDIFF the query simply substracts a number of days from the date.Is that a good practice at all?
Is there a better way of doing the time comparisons?
Is there a better way to do the whole thing? The bottom line is: I need to aggregate data by number of occurrences in time periods of 30, 60 and 90 days.
Note: LogDate original format is like 2019-04-01 18:30:12.000.
DECLARE #dt1 Datetime='2019-01-01'
DECLARE #dt2 Datetime=getDate();
WITH ctedaterange
AS (SELECT [Dates]=#dt1
UNION ALL
SELECT [dates] + 30
FROM ctedaterange
WHERE [dates] + 30<= #dt2)
SELECT
[dates],
lt.Activity, COUNT(*) as Total,
SUM(CASE WHEN lt.LogDate <= dates and lt.LogDate > dates - 90 THEN 1 ELSE 0 END) AS Activity90days,
SUM(CASE WHEN lt.LogDate <= dates and lt.LogDate > dates - 60 THEN 1 ELSE 0 END) AS Activity60days,
SUM(CASE WHEN lt.LogDate <= dates and lt.LogDate > dates - 30 THEN 1 ELSE 0 END) AS Activity30days
FROM ctedaterange AS cte
JOIN (SELECT Activity, CONVERT(DATE, LogDate) as LogDate FROM LogTable) AS lt
ON cte.[dates] = lt.LogDate
group by [dates], lt.Activity
OPTION (maxrecursion 0)
Sample dataset (LogTable):
LogDate, Activity
2020-02-25 01:10:10.000, Activity01
2020-04-14 01:12:10.000, Activity02
2020-08-18 02:03:53.000, Activity02
2019-10-29 12:25:55.000, Activity01
2019-12-24 18:11:11.000, Activity03
2019-04-02 03:33:09.000, Activity01
Expected Output (the output does not reflect the data shown above for I would need too many lines in the sample set to be shown in this post)
As I said above, the bottom line is: I need to aggregate data by number of occurrences in time periods of 30, 60 and 90 days.
Activity, Activity90days, Activity60days, Activity30days
Activity01, 3, 0, 1
Activity02, 1, 10, 2
Activity03, 5, 1, 3
Thank you for any suggestion.
SQL Server doesn't yet have the option to range over values of the window frame of an analytic function. Since you've generated all possible dates though and you've already got the counts by date, it's very easy to look back a specific number of (aggregated) rows to get the right totals. Here is my suggested expression for 90 days:
sum(count(LogDate)) over (
partition by Activity order by [dates]
with rows between 89 preceding and current row
)

SQL Server : average count of alerts per day, not including days with no alerts

I have a table that acts as a message log, with the two key tables being TIMESTAMP and TEXT. I'm working on a query that grabs all alerts (from TEXT) for the past 30 days (based on TIMESTAMP) and gives a daily average for those alerts.
Here is the query so far:
--goback 30 days start at midnight
declare #olderdate as datetime
set #olderdate = DATEADD(Day, -30, DATEDIFF(Day, 0, GetDate()))
--today at 11:59pm
declare #today as datetime
set #today = dateadd(ms, -3, (dateadd(day, +1, convert(varchar, GETDATE(), 101))))
print #today
--Grab average alerts per day over 30 days
select
avg(x.Alerts * 1.0 / 30)
from
(select count(*) as Alerts
from MESSAGE_LOG
where text like 'The process%'
and text like '%has alerted%'
and TIMESTAMP between #olderdate and #today) X
However, I want to add something that checks whether there were any alerts for a day and, if there are no alerts for that day, doesn't include it in the average. For example, if there are 90 alerts for a month but they're all in one day, I wouldn't want the average to be 3 alerts per day since that's clearly misleading.
Is there a way I can incorporate this into my query? I've searched for other solutions to this but haven't been able to get any to work.
This isn't written for your query, as I don't have any DDL or sample data, thus I'm going to provide a very simple example instead of how you would do this.
USE Sandbox;
GO
CREATE TABLE dbo.AlertMessage (ID int IDENTITY(1,1),
AlertDate date);
INSERT INTO dbo.AlertMessage (AlertDate)
VALUES('20190101'),('20190101'),('20190105'),('20190110'),('20190115'),('20190115'),('20190115');
GO
--Use a CTE to count per day:
WITH Tots AS (
SELECT AlertDate,
COUNT(ID) AS Alerts
FROM dbo.AlertMessage
GROUP BY AlertDate)
--Now the average
SELECT AVG(Alerts*1.0) AS DayAverage
FROM Tots;
GO
--Clean up
DROP TABLE dbo.AlertMessage;
You're trying to compute a double-aggregate: The average of daily totals.
Without using a CTE, you can try this as well, which is generalized a bit more to work for multiple months.
--get a list of events per day
DECLARE #Event TABLE
(
ID INT NOT NULL IDENTITY(1, 1)
,DateLocalTz DATE NOT NULL--make sure to handle time zones
,YearLocalTz AS DATEPART(YEAR, DateLocalTz) PERSISTED
,MonthLocalTz AS DATEPART(MONTH, DateLocalTz) PERSISTED
)
/*
INSERT INTO #Event(EntryDateLocalTz)
SELECT DISTINCT CONVERT(DATE, TIMESTAMP)--presumed to be in your local time zone because you did not specify
FROM dbo.MESSAGE_LOG
WHERE UPPER([TEXT]) LIKE 'THE PROCESS%' AND UPPER([TEXT]) LIKE '%HAS ALERTED%'--case insenitive
*/
INSERT INTO #Event(DateLocalTz)
VALUES ('2018-12-31'), ('2019-01-01'), ('2019-01-01'), ('2019-01-01'), ('2019-01-12'), ('2019-01-13')
--get average number of alerts per alerting day each month
-- (this will not return months with no alerts,
-- use a LEFT OUTER JOIN against a month list table if you need to include uneventful months)
SELECT
YearLocalTz
,MonthLocalTz
,AvgAlertsOfAlertingDays = AVG(CONVERT(REAL, NumDailyAlerts))
FROM
(
SELECT
YearLocalTz
,MonthLocalTz
,DateLocalTz
,NumDailyAlerts = COUNT(*)
FROM #Event
GROUP BY YearLocalTz, MonthLocalTz, DateLocalTz
) AS X
GROUP BY YearLocalTz, MonthLocalTz
ORDER BY YearLocalTz ASC, MonthLocalTz ASC
Some things to note in my code:
I use PERSISTED columns to get the month and year date parts (because I'm lazy when populating tables)
Use explicit CONVERT to escape integer math that rounds down decimals. Multiplying by 1.0 is a less-readable hack.
Use CONVERT(DATE, ...) to round down to midnight instead of converting back and forth between strings
Do case-insensitive string searching by making everything uppercase (or lowercase, your preference)
Don't subtract 3 milliseconds to get the very last moment before midnight. Change your semantics to interpret the end of a time range as exclusive, instead of dealing with the precision of your datatypes. The only difference is using explicit comparators (i.e. use < instead of <=). Also, DATETIME resolution is 1/300th of a second, not 3 milliseconds.
Avoid using built-in keywords as column names (i.e. "TEXT"). If you do, wrap them in square brackets to avoid ambiguity.
Instead of dividing by 30 to get the average, divide by the count of distinct days in your results.
select
avg(x.Alerts * 1.0 / x.dd)
from
(select count(*) as Alerts, count(distinct CAST([TIMESTAMP] AS date)) AS dd
...

Querying result from select part of statement

I have a stored procedure to work out how many working days between two dates
select
casekey, LoginName, casestartdatedate,
dbo.CalcWorkDaysBetween(casestartdatedate, GETDATE()) AS 'WD'
from
Car_case with (nolock)
where
dbo.CalcWorkDaysBetween(casestartdatedate, GETDATE()) <= DATEADD(dd,DATEDIFF(dd, 0, GETDATE()), -60)
and CaseClosedDateDate is null
order by
CaseStartDateDate asc
In my select part of statement I want to show the number of working days between the case start date and today's date. This part is fine. But I only want to return cases where the 'working days' is 60 days or greater - I'm having trouble with this part of query. See my code above. not too sure why its not working. It's returning results less than and greater than 60 days making me realize I've gone wrong somewhere.
Any help would be appreciated!
If I understand correctly, you just need to fix the where condition:
select casekey, LoginName, casestartdatedate,
dbo.CalcWorkDaysBetween(casestartdatedate, GETDATE()) AS WD
from Car_case cc with (nolock)
where dbo.CalcWorkDaysBetween(casestartdatedate, GETDATE()) >= 60 and
CaseClosedDateDate is null
order by CaseStartDateDate asc;
Note: In your version you are comparing the result of the function (which is presumably an integer) to a date.

SQL Getting data by the hour

Hi I have a weather database in SQL Server 2008 that is filled with weather observations that are taken every 20 minutes. I want to get the weather records for each hour not every 20 minutes how can I filter out some the results so only the first observation for each hour is in the results.
Example:
7:00:00
7:20:00
7:40:00
8:00:00
Desired Output
7:00:00
8:00:00
To get exactly (less the fact that it's an INT instead of a TIME; nothing hard to fix) what you listed as your desired result,
SELECT DISTINCT DATEPART(HOUR, TimeStamp)
FROM Observations
You could also add in CAST(TimeStamp AS DATE) if you wanted that as well.
Assuming you want the data as well, however, it depends a little, but from exactly what you've described, the simple solution is just to say:
SELECT *
FROM Observations
WHERE DATEPART(MINUTE, TimeStamp) = 0
That fails if you have missing data, though, which is pretty common.
If you do have some hours where you want data but don't have a row at :00, you could do something like this:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY CAST(TimeStamp AS DATE), DATEPART(HOUR, TimeStamp) ORDER BY TimeStamp)
FROM Observations
)
SELECT *
FROM cte
WHERE n = 1
That'll take the first one for any date/hour combination.
Of course, you're still leaving out anything where you had no data for an entire hour. That would require a numbers table, if you even want to return those instances.
You can use a formula like the following one to get the nearest hour of a time point (in this case it's GETUTCDATE()).
SELECT DATEADD(MINUTE, DATEDIFF(MINUTE, 0, GETUTCDATE()) / 60 * 60, 0)
Then you can use this formula in the WHERE clause of your SQL query to get the data you want.
What you need is to GROUP BY your desired time frame, like the date and the hours. Then, you get the MIN value of the timeframe. Since you didn't specify which columns you are using, this is the most generic thing i can give.
Use as filter :
... where DATEPART(MINUTE, DateColumn) = 0
To filter the result for every whole hour, you can set your where clause to check for 00 minute since every whole hour is HH:00:00.
To get the minute part from a time-stamp, you can use DATEPART function.
SELECT *
FROM YOURTABLENAME
WHERE DATEPART(MINUTE, YOURDATEFIELDNAME) = 0
More information on datepart function can be found here: http://www.w3schools.com/sql/func_datepart.asp

Find rows in a database with no time in a datetime column

During testing I have failed to notice an incorrect date/time entry into the database on certain orders. Instead of entering the date and time I have only been entering the date. I was using the correct time stamp createodbcdatetime(now()) however I was using cfsqltype="cf_sql_date" to enter it into the database.
I am lucky enough to have the order date/time correctly recorded, meaning I can use the time from the order date/time field.
My question being can I filter for all rows in the table with only dates entered. My data below;
Table Name: tbl_orders
uid_orders dte_order_stamp
2000 02/07/2012 03:02:52
2001 03/07/2012 01:24:21
2002 03/07/2012 08:34:00
Table Name: tbl_payments
uid_payment dte_pay_paydate uid_pay_orderid
1234 02/07/2012 03:02:52 2000
1235 03/07/2012 2001
1236 03/07/2012 2002
I need to be able to select all payments with no time entered from tbl_payments, i can then loop around the results grabbing the time from my order table add it to the date from my payment table and update the field with the new date/time.
I can pretty much handle the re-inserting the date/time. It's just selecting the no time rows I'm not sure about?
Any help would be appreciated.
The following is the select statements for both orders and payments and if they need to be joined.(just fyi)
SQL Server 2008, Cold Fusion 9
SELECT
dbo.tbl_orders.uid_orders,
dbo.tbl_orders.dte_order_stamp,
dbo.tbl_payment.dte_pay_paydate,
dbo.tbl_payment.uid_pay_orderid
FROM
dbo.tbl_orders
INNER JOIN dbo.tbl_payment ON (dbo.tbl_orders.uid_orders = dbo.tbl_payment.uid_pay_orderid)
SELECT
dbo.tbl_orders.uid_orders,
dbo.tbl_orders.dte_order_stamp
FROM dbo.tbl_orders
SELECT
uid_paymentid,
uid_pay_orderid,
dte_pay_paydate,
FROM
dbo.tbl_payment
Select the records where the hours, minutes, seconds and millisecond value is zero.
select *
from table
where datePart(hour, datecolumn) = 0
and datePart(minute, datecolumn) = 0
and datePart(second, datecolumn) = 0
and datePart(millisecond, datecolumn) = 0
You can probably get those values by casting to time and checking for 0:
SELECT * FROM table WHERE CAST(datetimecolumn AS TIME) = '00:00'
That may not be particularly efficient though, depending on how smart SQL Server's indexes are.
Something like this should work:
....
WHERE CAST(CONVERT(VARCHAR, dbo.tbl_payment.dte_pay_paydate, 101) AS DATETIME) =
dbo.tbl_payment.dte_pay_paydate
This will return all rows where the time is missing.