How to count every half hour? - sql

I have a query that its counting every hour, using a pivot table.
How would it be possible to get the count for every 30 minutes?
for example 8:00-8:29,8:30-8:59,9:00-9:29 etc. until 5:00
SELECT CONVERT(varchar(8),start_date,1) AS 'Day',
SUM(CASE WHEN DATEPART(hour,start_date) = 8 THEN 1 ELSE 0 END) as eight ,
SUM(CASE WHEN DATEPART(hour,start_date) = 9 THEN 1 ELSE 0 END) AS nine,
SUM(CASE WHEN DATEPART(hour,start_date) = 10 THEN 1 ELSE 0 END) AS ten,
SUM(CASE WHEN DATEPART(hour,start_date) = 11 THEN 1 ELSE 0 END) AS eleven,
SUM(CASE WHEN DATEPART(hour,start_date) = 12 THEN 1 ELSE 0 END) AS twelve,
SUM(CASE WHEN DATEPART(hour,start_date) = 13 THEN 1 ELSE 0 END) AS one_clock,
SUM(CASE WHEN DATEPART(hour,start_date) = 14 THEN 1 ELSE 0 END) AS two_clock,
SUM(CASE WHEN DATEPART(hour,start_date) = 15 THEN 1 ELSE 0 END) AS three_clock,
SUM(CASE WHEN DATEPART(hour,start_date) = 16 THEN 1 ELSE 0 END) AS four_clock
FROM test
where user_id is not null
GROUP BY CONVERT(varchar(8),start_date,1)
ORDER BY CONVERT(varchar(8),start_date,1)
I use sql server 2012 (version Microsoft SQL Server Management Studio 11.0.3128.0)

Try using iif as below:
SELECT CONVERT(varchar(8),start_date,1) AS 'Day', SUM(iif(DATEPART(hour,start_date) = 8 and
DATEPART(minute,start_date) >= 0 and
DATEPART(minute,start_date) =< 29,1,0)) as eight_tirty
FROM test where user_id is not null GROUP BY
CONVERT(varchar(8),start_date,1) ORDER BY
CONVERT(varchar(8),start_date,1)

To get counts by day and half hour, something like this should work.
SELECT day, half_hour, count(1) AS half_hour_count
FROM (
SELECT
CAST(start_date AS date) AS day,
DATEPART(hh, start_date)
+ 0.5*(DATEPART(n,start_date)/30) AS half_hour
FROM test
WHERE user_id IS NOT NULL
) qry
GROUP BY day, half_hour
ORDER BY day, half_hour;
Formatting the result could be done later.

You need a few things, and then this query just falls together.
First, assuming you need multiple dates, you're going to want what's known as a Calendar Table (hands down, probably the most useful analysis table).
Next, you're going to want either an existing Numbers table if you have one, or just generate the first on the fly:
WITH Halfs AS (SELECT CAST(0 AS INT) m
UNION ALL
SELECT m + 1
FROM Halfs
WHERE m < 24 * 2)
SELECT m
FROM Halfs
(recursive CTE - generates a table with a list of numbers starting at 0).
These two tables will provide the basis for a range query based on the timestamps in your main table. This will make it very easy for the optimizer to bucket rows for whatever aggregation you're doing. That's done by CROSS JOINing the two tables together in a subquery, as well as adding a couple of other derived columns:
WITH Halfs AS (SELECT CAST(0 AS INT) m
UNION ALL
SELECT m + 1
FROM Halfs
WHERE m < 24 * 2)
SELECT calendarDate, m, rangeStart, rangeEnd
FROM (SELECT Calendar.calendarDate, Halfs.m rangeGroup,
DATEADD(minutes, m * 30, CAST(Calendar.calendarDate AS DATETIME2) rangeStart,
DATEADD(minutes, (m + 1) * 30, CAST(Calendar.calendarDate AS DATETIME2) rangeEnd
FROM Calendar
CROSS JOIN Halfs
WHERE Calendar.calendarDate >= CAST('20160823' AS DATE)
AND Calendar.calendarDate < CAST('20160830' AS DATE)
-- OR whatever your date range actually is.
) Range
ORDER BY rangeStart
(note that, if the range of dates is sufficiently large, it may be beneficial to save this off as a temporary table with indicies. For small tables and datasets, the performance gain isn't likely to be noticeable)
Now that we have our ranges, it's trivial to get our groups, and pivot the table.
Oh, and SQL Server has a specific operator for PIVOTing.
WITH Halfs AS (SELECT CAST(0 AS INT) m
UNION ALL
SELECT m + 1
FROM Halfs
WHERE m < 3 * 2)
-- Intentionally limiting range for example only
SELECT calendarDate AS day, [0], [1], [2], [3], [4], [5], [6]
-- If you're displaying "nice" names,
-- do it at this point, or in the reporting application
FROM (SELECT Range.calendarDate, Range.rangeGroup
FROM (SELECT Calendar.calendarDate, Halfs.m rangeGroup,
DATEADD(minutes, m * 30, CAST(Calendar.calendarDate AS DATETIME2) rangeStart,
DATEADD(minutes, (m + 1) * 30, CAST(Calendar.calendarDate AS DATETIME2) rangeEnd
FROM Calendar
CROSS JOIN Halfs
WHERE Calendar.calendarDate >= CAST('20160823' AS DATE)
AND Calendar.calendarDate < CAST('20160830' AS DATE)
-- OR whatever your date range actually is.
) Range
LEFT JOIN Test
ON Test.user_id IS NOT NULL
AND Test.start_date >= Range.rangeStart
AND Test.start_date < Range.rangeEnd
) AS DataTable
PIVOT (COUNT(*)
FOR Range.rangeGroup IN ([0], [1], [2], [3], [4], [5], [6])) AS PT
-- Only covers the first 6 groups,
-- or the first three hours.
ORDER BY day
The pivot should take care of the getting individual columns, and COUNT will automatically resolve null rows. Should be all you need.

Related

CASE WHEN condition with MAX() function

There are a lot questions on CASE WHEN topic, but the closest my question is related to this How to use CASE WHEN condition with MAX() function query which has not been resolved.
Here is some of my sample data:
date
debet
2022-07-15
57190.33
2022-07-14
815616516.00
2022-07-15
40866.67
2022-07-14
1221510.00
So, I want to all records for the last two dates and three additional columns: sum(sales) for the previous day, sum for the current day and the difference between them:
SELECT
[debet],
[date] ,
SUM( CASE WHEN [date] = MAX(date) THEN [debet] ELSE 0 END ) AS sum_act,
SUM( CASE WHEN [date] = MAX(date) - 1 THEN [debet] ELSE 0 END ) AS sum_prev ,
(
SUM( CASE WHEN [date] = MAX(date) THEN [debet] ELSE 0 END )
-
SUM( CASE WHEN [date] = MAX(date) - 1 THEN [debet] ELSE 0 END )
) AS diff
FROM
Table
WHERE
[date] = ( SELECT MAX(date) FROM Table WHERE date < ( SELECT MAX(date) FROM Table) )
OR
[date] = ( SELECT MAX(date) FROM Table WHERE date = ( SELECT MAX(date) FROM Table ) )
GROUP BY
[date],
[debet]
Further, of course, it informs that I can't use the aggregate function inside CASE WHEN. Now I use this combination: sum(CASE WHEN [date] = dateadd(dd,-3,cast(getdate() as date)) THEN [debet] ELSE 0 END). But here every time I need to make an adjustment for weekends and holidays. The question is, is there any other way than using 'getdate' in 'case when' Statement to get max date?
Expected result:
date
sum_act
sum_prev
diff
2022-07-15
97190.33
0.00
97190.33
2022-07-14
0.00
508769.96
-508769.96
You can use dense_rank() to filter the last 2 dates in your table. After that you can use either conditional case expression with sum() to calculate the required value
select [date],
sum_act = sum(case when rn = 1 then [debet] else 0 end),
sum_prev = sum(case when rn = 2 then [debet] else 0 end),
diff = sum(case when rn = 1 then [debet] else 0 end)
- sum(case when rn = 2 then [debet] else 0 end)
from
(
select *, rn = dense_rank() over (order by [date] desc)
from tbl
) t
where rn <= 2
group by [date]
db<>fiddle demo
Two steps:
Get the sums for the last three dates
Show the results for the last two dates.
Well, we could also get all daily sums in step 1, but we just need the last three in order to calculate the sums for the last two days, so why aggregate more data than necessary?
Here is the query. You may have to put the date column name in brackets in SQL Server, as date is a keyword in SQL.
select top(2)
date,
sum_debit_current,
sum_debit_previous,
sum_debit_current - sum_debit_previous as diff
(
select
date,
sum(debet) as sum_debit_current,
lag(sum(debet)) over (order by date) as sum_debit_previous
from table
where date in (select distinct top(3) date from table order by date desc)
group by date
)
order by date desc;
(SQL Server uses TOP(n) instead of standard SQL FETCH FIRST 3 ROWS and while SELECT DISTINCT TOP(3) date looks like "get the top 3 rows, then apply distinct on their date", it is really "apply distinct on the dates, then get the top 3" like in standard SQL.)

SQL query to aggregate month in one table

I have a table with one column (TK) with multiple values, also duplicated and another one column with date.
I need to return a table with first column with distinct(TK) and the other columns like month.
I do an example into SQL FIDDLE
http://sqlfiddle.com/#!18/14cb9f/28
TK
JANUARY
open a
4
open B
4
TK
FEBRUARY
open a
4
open B
4
I need
TK
JANUARY
FEBRUARY
open a
4
4
open B
4
4
Thanks
A simple conditional aggregation should do the trick
SELECT TK
,Janary = sum( case when month(datastart)=1 then 1 else 0 end )
,February = sum( case when month(datastart)=2 then 1 else 0 end )
From TEST
Where year(datastart)=2021
Group By TK
Or you can use PIVOT
Select *
From (
Select TK
,Col = datename(month,DataStart)
,Val = 1
From TEST
Where year(datastart)=2021
) src
Pivot ( sum(Val) for Col in ([January] ,[February] ) ) pvt
There are multiple ways to do this, but avoiding sub-queries and making the syntax simple to read, this is the simplest I can get:
SELECT
TK,
SUM(
CASE WHEN DATASTART >= '2021-01-01' AND DATASTART < '2021-02-01' THEN 1 ELSE 0 END
) AS JENUARY,
SUM(
CASE WHEN DATASTART >= '2021-02-01' AND DATASTART <= '2021-02-28' THEN 1 ELSE 0 END
) AS FEBRUARY
FROM
Test
GROUP BY
TK
Check it out
http://sqlfiddle.com/#!18/14cb9f/34

SQL Server query, remove date dimension

I need help in removing the date dimension from the query below. In other words make the query independent of the date / time interval
My goal is to load the table into SSAS so that i would not have to change the date every time i run reports.
the query is huge (months, quarters, years, and aggregated date CR12,PR12 ...), i just gave a short example below
I sincerly appreciate any help
drop table #tmptmp
SELECT *, (DATEDIFF(day, enrollmentsDate, ShipmentDate))
- ((DATEDIFF(WEEK, enrollmentsenttDate, InitialShipmentDate) * 2)
+(CASE WHEN DATENAME(DW, enrollmentsentDate) = 'Sunday' THEN 1 ELSE 0 END)
+(CASE WHEN DATENAME(DW, ShipmentDate) = 'Saturday' THEN 1 ELSE 0 END)
- (select count(*) from tblFactoryHolidayDates where Date >= enrollmentsentDate
and Date < InitialShipmentDate)) as countdays into #tmptmp from
#tmpTouchpointsEnrollments
where EnrollmentSentDate is not null
----------------------------
drop table #tmp
select * into #tmp
from #tmptmp
where countdays < 20
drop table #tmpMetric
Select 'GrandTotal' as Dummy,'Avg days' as Metrics,'1' as MetricOrder,
Sum(case when Year(EnrollmentReceiveddate) ='2010' then (countdays) end) *1.0/
count(case when Year(EnrollmentReceiveddate) ='2010' then (patientID) end) *1.0 as Y2010,
into #tmpMetric
from #tmp
Thank you very much

More efficient way of grouping rows by hour (using a timestamp)

I'm trying to show a log of daily transactions that take place. My current method is embarrassingly inefficient and I'm sure there is a much better solution. Here is my current query:
select ReaderMACAddress,
count(typeid) as 'Total Transactions',
SUM(CASE WHEN CAST("Timestamp" as TIME) between '05:00:00' and '11:59:59' THEN 1 ELSE 0 END) as 'Morning(5am-12pm)',
SUM(CASE WHEN CAST("Timestamp" as TIME) between '12:00:00' and '17:59:59' THEN 1 ELSE 0 END) as 'AfternoonActivity(12pm-6pm)',
SUM(CASE WHEN CAST("Timestamp" as TIME) between '18:00:00' and '23:59:59' THEN 1 ELSE 0 END) as 'EveningActivity(6pm-12am)',
SUM(CASE WHEN CAST("Timestamp" as TIME) between '00:00:00' and '04:59:59' THEN 1 ELSE 0 END) as 'OtherActivity(12am-5am)'
from Transactions
where ReaderMACAddress = '0014f54033f5'
Group by ReaderMACAddress;
which returns the results:
ReaderMACAddress Total Transactions Morning(5am-12pm) AfternoonActivity(12pm-6pm) EveningActivity(6pm-12am) OtherActivity(12am-5am)
0014f54033f5 932 269 431 232 0
(sorry for any alignment issues here)
At the moment I only want to look at a single Reader that I specify (through the where clause). Ideally, it would be easier to read if the time sections were in a single column and the results, i.e. a count function were in a second column yielding results such as:
Total Transactions 932
Morning(5am-12pm) 269
AfternoonActivity(12pm-6pm) 431
EveningActivity(6pm-12am) 232
OtherActivity(12am-5am) 0
Thanks for any help :)
I would first consider a computed column, but I believe from a previous post you don't have the ability to change the schema. So how about a view?
CREATE VIEW dbo.GroupedReaderView
AS
SELECT ReaderMACAddress,
Slot = CASE WHEN t >= '05:00' AND t < '12:00' THEN 1
WHEN t >= '12:00' AND t < '18:00' THEN 2
WHEN t >= '18:00' THEN 3 ELSE 4 END
FROM
(
SELECT ReaderMACAddress, t = CONVERT(TIME, [Timestamp])
FROM dbo.Transactions
) AS x;
Now your per-MAC address query is much, much simpler:
SELECT Slot, COUNT(*)
FROM dbo.GroupedReaderView
WHERE ReaderMACAddress = '00...'
GROUP BY Slot;
This will provide a result like:
1 269
2 431
3 232
4 0
You can also add WITH ROLLUP which will provide a grand total with the Slot column being NULL:
SELECT Slot, COUNT(*)
FROM dbo.GroupedReaderView
WHERE ReaderMACAddress = '00...'
GROUP BY Slot
WITH ROLLUP;
Should yield:
1 269
2 431
3 232
4 0
NULL 932
And you can pivot that if you need to, add labels per slot, etc. in your presentation tier.
You could also do it this way, it just makes the view a lot more verbose and pulls a lot of extra data when you query it directly; it's also slightly less efficient to group by strings.
CREATE VIEW dbo.GroupedReaderView
AS
SELECT ReaderMACAddress,
Slot = CASE WHEN t >= '05:00' AND t < '12:00' THEN
'Morning(5am-12pm)'
WHEN t >= '12:00' AND t < '18:00' THEN
'Afternoon(12pm-6pm)'
WHEN t >= '18:00' THEN
'Evening(6pm-12am)'
ELSE
'Other(12am-5am)'
END
FROM
(
SELECT ReaderMACAddress, t = CONVERT(TIME, [Timestamp])
FROM dbo.Transactions
) AS x;
These aren't necessarily more efficient than what you've got, but they're less repetitive and easier on the eyes. :-)
Also if you don't want to (or can't) create a view, you can just put that into a subquery, e.g.
SELECT Slot, COUNT(*)
FROM
(
SELECT ReaderMACAddress,
Slot = CASE WHEN t >= '05:00' AND t < '12:00' THEN
'Morning(5am-12pm)'
WHEN t >= '12:00' AND t < '18:00' THEN
'Afternoon(12pm-6pm)'
WHEN t >= '18:00' THEN
'Evening(6pm-12am)'
ELSE
'Other(12am-5am)'
END
FROM
(
SELECT ReaderMACAddress, t = CONVERT(TIME, [Timestamp])
FROM dbo.Transactions
) AS x
) AS y
WHERE ReaderMACAddress = '00...'
GROUP BY Slot
WITH ROLLUP;
Just an alternative that still lets you use BETWEEN and may be even a little less verbose:
SELECT Slot, COUNT(*)
FROM
(
SELECT ReaderMACAddress,
Slot = CASE WHEN h BETWEEN 5 AND 11 THEN 'Morning(5am-12pm)'
WHEN h BETWEEN 12 AND 17 THEN 'Afternoon(12pm-6pm)'
WHEN h >= 18 THEN 'Evening(6pm-12am)'
ELSE 'Other(12am-5am)'
END
FROM
(
SELECT ReaderMACAddress, h = DATEPART(HOUR, [Timestamp])
FROM dbo.Transactions
) AS x
) AS y
WHERE ReaderMACAddress = '00...'
GROUP BY Slot
WITH ROLLUP;
UPDATE
To always include each slot even if there are no results for that slot:
;WITH slots(s, label, h1, h2) AS
(
SELECT 1, 'Morning(5am-12pm)' , 5, 11
UNION ALL SELECT 2, 'Afternoon(12pm-6pm)' , 12, 17
UNION ALL SELECT 3, 'Evening(6pm-12am)' , 18, 23
UNION ALL SELECT 4, 'Other(12am-5am)' , 0, 4
)
SELECT s.label, c = COALESCE(COUNT(y.ReaderMACAddress), 0)
FROM slots AS s
LEFT OUTER JOIN
(
SELECT ReaderMACAddress, h = DATEPART(HOUR, [Timestamp])
FROM dbo.Transactions
WHERE ReaderMACAddress = '00...'
) AS y
ON y.h BETWEEN s.h1 AND s.h2
GROUP BY s.label
WITH ROLLUP;
The key in all of these cases is to simplify and not repeat yourself. Even if SQL Server only performs it once, why convert to time 4+ times?

SQL statement to get record datetime field value as column of result

I have the following two tables
activity(activity_id, title, description, group_id)
statistic(statistic_id, activity_id, date, user_id, result)
group_id and user_id come from active directory. Result is an integer.
Given a user_id and a date range of 6 days (Mon - Sat) which I've calculated on the business logic side, and the fact that some of the dates in the date range may not have a statistic result for the particular date (ie. day1 and day 4 may have entered statistic rows for a particular activity, but there may not be any entries for days 2, 3, 5 and 6) how can I get a SQL result with the following format? Keep in mind that if a particular activity doesn't have a record for the particular date in the statistics table, then that day should return 0 in the SQL result.
activity_id group_id day1result day2result day3result day4result day5result day6 result
----------- -------- ---------- ---------- ---------- ---------- ---------- -----------
sample1 Secured 0 5 1 0 2 1
sample2 Unsecured 1 0 0 4 3 2
Note: Currently I am planning on handling this in the business logic, but that would require multiple queries (one to create a list of distinct activities for that user for the date range, and one for each activity looping through each date for a result or lack of result, to populate the 2nd dimension of the array with date-related results). That could end up with 50+ queries for each user per date range, which seems like overkill to me.
I got this working for 4 days and I can get it working for all 6 days, but it seems like overkill. Is there a way to simplify this?:
SELECT d1d2.activity_id, ISNULL(d1d2.result1,0) AS day1, ISNULL(d1d2.result2,0) AS day2, ISNULL(d3d4.result3,0) AS day3, ISNULL(d3d4.result4,0) AS day4
FROM
(SELECT ISNULL(d1.activity_id,0) AS activity_id, ISNULL(result1,0) AS result1, ISNULL(result2,0) AS result2
FROM
(SELECT ISNULL(statistic_result,0) AS result1, ISNULL(activity_id,0) AS activity_id
FROM statistic
WHERE user_id='jeremiah' AND statistic_date='11/22/2011'
) d1
FROM JOIN
(SELECT ISNULL(statistic_result,0) AS result2, ISNULL(activity_id,0) AS activity_id
FROM statistic WHERE user_id='jeremiah' AND statistic_date='11/23/2011'
) d2
ON d1.activity_id=d2.activity_id
) d1d2
FULL JOIN
(SELECT d3.activity_id AS activity_id, ISNULL(d3.result3,0) AS result3, ISNULL(d4.result4,0) AS result4
FROM
(SELECT ISNULL(statistic_result,0) AS result3, ISNULL(activity_id,0) AS activity_id
FROM statistic WHERE user_id='jeremiah' AND statistic_date='11/24/2011'
) d3
FULL JOIN
(SELECT ISNULL(statistic_result,0) AS result4, ISNULL(activity_id,0) AS activity_id
FROM statistic WHERE user_id='jeremiah' AND statistic_date='11/25/2011'
) d4
ON d3.activity_id=d4.activity_id
) d3d4
ON d1d2.activity_id=d3d4.activity_id
ORDER BY d1d2.activity_id
Here is a typical approach for this kind of thing:
DECLARE #minDate DATETIME,
#maxdate DATETIME,
#userID VARCHAR(200)
SELECT #minDate = '2011-11-15 00:00:00',
#maxDate = '2011-11-22 23:59:59',
#userID = 'jeremiah'
SELECT A.activity_id, A.group_id,
SUM(CASE WHEN DATEDIFF(day, #minDate, S.date) = 0 THEN S.Result ELSE 0 END) AS Day1Result,
SUM(CASE WHEN DATEDIFF(day, #minDate, S.date) = 1 THEN S.Result ELSE 0 END) AS Day2Result,
SUM(CASE WHEN DATEDIFF(day, #minDate, S.date) = 2 THEN S.Result ELSE 0 END) AS Day3Result,
SUM(CASE WHEN DATEDIFF(day, #minDate, S.date) = 3 THEN S.Result ELSE 0 END) AS Day4Result,
SUM(CASE WHEN DATEDIFF(day, #minDate, S.date) = 4 THEN S.Result ELSE 0 END) AS Day5Result,
SUM(CASE WHEN DATEDIFF(day, #minDate, S.date) = 5 THEN S.Result ELSE 0 END) AS Day6Result
FROM activity A
LEFT OUTER JOIN statistic S
ON A.activity_id = S.activity_ID
AND S.user_id = #userID
WHERE S.date between #minDate AND #maxDate
GROUP BY A.activity_id, A.group_id
First, I'm using group by to reduce the resultset to one row per activity_id/group_id, then I'm using CASE to separate values for each individual column. In this case I'm looking at which day in the last seven, but you can use whatever logic there to determine what date. The case statements will return the value of S.result if the row is for that particular day, or 0 if it's not. SUM will add up the individual values (or just the one, if there is only one) and consolidate that into a single row.
You'll also note my date range is based on midnight on the first day in the range and 11:59PM on the last day of the range to ensure all times are included in the range.
Finally, I'm performing a left join so you will always have a 0 in your columns, even if there are no statistics.
I'm not entirely sure how your results are segregated by group in addition to activity (unless group is a higher level construct), but here is the approach I would take:
SELECT activity_id
day1result = SUM(CASE DATEPART(weekday, date) WHEN 1 THEN result ELSE 0 END)
FROM statistic
GROUP BY activity_id
I will leave the rest of the days and addition of group_id to you, but you should see the general approach.