Select Consecutive Numbers in SQL - sql

This feels simple, but I can't find an answer anywhere.
I'm trying to run a query by time of day for each hour. So I'm doing a Group By on the hour part, but not all hours have data, so there are some gaps. I'd like to display every hour, regardless of whether or not there's data.
Here's a sample query:
SELECT DATEPART(HOUR, DATEADD(HH,-5, CreationDate)) As Hour,
COUNT(*) AS Count
FROM Comments
WHERE UserId = ##UserId##
GROUP BY DATEPART(HOUR, DATEADD(HH,-5, CreationDate))
My thought was to Join to a table that already had numbers 1 through 24 so that the incoming data would get put in it's place.
Can I do this with a CTE?
WITH Hours AS (
SELECT i As Hour --Not Sure on this
FROM [1,2,3...24]), --Not Sure on this
CommentTimes AS (
SELECT DATEPART(HOUR, DATEADD(HH,-5, CreationDate)) AS Hour,
COUNT(*) AS Count
FROM Comments
WHERE UserId = ##UserId##
GROUP BY DATEPART(HOUR, DATEADD(HH,-5, CreationDate))
)
SELECT h.Hour, c.Count
FROM Hours h
JOIN CommentTimes c ON h.Hour = c.Hour
###Here's a sample Query From Stack Exchange Data Explorer

You can use a recursive query to build up a table of whatever numbers you want. Here we stop at 24. Then left join that to your comments to ensure every hour is represented. You can turn these into times easily if you wanted. I also changed your use of hour as a column name as it is a keyword.
;with dayHours as (
select 1 as HourValue
union all select hourvalue + 1
from dayHours
where hourValue < 24
)
,
CommentTimes As (
SELECT DATEPART(HOUR, DATEADD(HH,-5, CreationDate)) As HourValue,
COUNT(*) AS Count
FROM Comments
WHERE UserId = ##UserId##
GROUP BY DATEPART(HOUR, DATEADD(HH,-5, CreationDate)))
SELECT h.Hour, c.Count
FROM dayHours h
left JOIN CommentTimes c ON h.HourValue = c.HourValue

You can use a table value constructor:
with hours as (
SELECT hr
FROM (VALUES (1), (2), (3), (4), (5), (6), (7), (8), (9), (10), (11), (12)) AS b(hr)
)
etc..
You can also use a permanent auxilliary numbers table.
http://dataeducation.com/you-require-a-numbers-table/

Use a recursive CTE to generate the hours:
with hours as (
select 1 as hour
union all
select hour + 1
from hours
where hour < 24
)
. . .
Then your full query needs a left outer join:
with hours as (
select 1 as hour
union all
select hour + 1
from hours
where hour < 24
)
CommentTimes As (
SELECT DATEPART(HOUR, DATEADD(HH,-5, CreationDate)) As Hour,
COUNT(*) AS Count
FROM Comments
WHERE UserId = ##UserId##
GROUP BY DATEPART(HOUR, DATEADD(HH,-5, CreationDate))
)
SELECT h.Hour, c.Count
FROM Hours h LEFT OUTER JOIN
CommentTimes c
ON h.Hour = c.Hour;

Below is demo without using recursive CTE for sql-server
select h.hour ,c.count
from (
select top 24 number + 1 as hour from master..spt_values
where type = 'P'
) h
left join (
select datepart(hour, creationdate) as hour,count(1) count
from comments
where userid = '9131476'
group by datepart(hour, creationdate)
) c on h.hour = c.hour
order by h.hour;
online demo link : consecutive number query demo - Stack Exchange Data Explorer

The basic idea is correct, but you will want to perform a left join instead of a standard join. The reason for the left join is because you want the answers from the left-hand side.
With respect to how to create the original hours table, you can either directly create it with something like:
SELECT 1 as hour
UNION ALL
SELECT 2 as hour
...
UNION ALL
SELECT 24 as hour
or, you can create a permanent table populated with these values. (I do not recall immediately on SqlServer if there is a better way to do this, or if selecting a value but not from a table is allowed. On Oracle, you could select from the built-in table 'dual' which is a table containing a single row).

As a more general abstraction of this issue, you can create consecutive numbers Brad and Gordon have suggested with a recursive CTE like this:
WITH Numbers AS (
SELECT 1 AS Number
UNION ALL SELECT Number + 1
FROM Numbers
WHERE Number < 1000
)
SELECT * FROM Numbers
OPTION (MaxRecursion 0)
As a note, if you plan to go over 100 numbers, you'll need to add OPTION (MaxRecursion 0) to the end of your query to prevent the error The maximum recursion 100 has been exhausted before statement completion
This technique can commonly be seen when populating or using a Tally Table in TSQL

Related

Calculate Consecutive Concurrent Calls SQL Server

I just have the basic SQL skills hoping someone can help me out. I am using SQL Server trying to come up with a query to calculate consecutive concurrent calls happening at the same time per day. My company only has the license for 300 concurrent calls and were trying to find out the max point we reach per day. Basically if 3 people are on a call at 9:00 am and all 3 calls end at 9:15 the count would be 3. if another call happens at 9:05 am and ends at 9:20 am the count is now 4,but at 9:16 am the count would only be 1 then
I have a table (conferencecall2) with following columns:
CallID, UniqueCallID, Jointime, Leavetime
We get about 5000-6000 calls per day
Below is sample of some data.
The key here is to have (or generate) a table with one row for each time period. Then it's a simple APPLY or scalar subquery:
select t.minute, c.calls
from time_table_with_one_row_per_minute t
cross apply
(
select count(*) calls
from calls c
where t.Minute >= c.JoinTime
and t.Minute <= c.LeaveTime
) c
You can do this by unpivoting the columns, then using window functions:
select x.call_time, sum(sum(x.cnt_calls)) over(order by x.call_time) as cnt
from conferencecall2 c
cross apply (values (c.jointime, 1), (c.leavetime, -1)) as x(call_time, cnt_calls)
group by x.call_time
This solution scans the table only once, so I would expect it to perform efficiently over a large dataset.
Edit: you can get the peak of concurent calls per day with another level of subquery:
select convert(date, call_time) as call_day, max(cnt) as peak_cnt
from (
select x.call_time, sum(sum(x.cnt_calls)) over(order by x.call_time) as cnt
from conferencecall2 c
cross apply (values (c.jointime, 1), (c.leavetime, -1)) as x(call_time, cnt_calls)
group by x.call_time
) c
group by convert(date, call_time)
Edit 2
If you want to filter, then you need to do that in the outer query:
select convert(date, call_time) as call_day, max(cnt) as peak_cnt
from (
select x.call_time, sum(sum(x.cnt_calls)) over(order by x.call_time) as cnt
from conferencecall2 c
cross apply (values (c.jointime, 1), (c.leavetime, -1)) as x(call_time, cnt_calls)
group by x.call_time
) c
where call_time >= #endtime and call_time < #endtime
group by convert(date, call_time)

SQL question - how to create date + hour dimension table?

I would like to create a table showing hours 0 through 24 for each date since 1/1/2020 (until current). It would look something like this:
enter image description here
Column 1: Date from 1/1/2020 until current
Column 2: Hour 0-24, repeating for each date
Here is one way to do this using T-SQL:
with
tally as
(
select top 1000 n = row_number() over(order by (select null)) - 1 from sys.messages
),
calendar as
(
select [Date] = cast(dateadd(d, n, '1/1/2020') as date) from tally where n < datediff(d, '1/1/2020', getdate())
),
[hours] as
(
select top 24 n from tally
)
--select * from tally;
--select * from calendar;
select [Date] = format(c.[Date], 'M/d/yyyy'), Hrs = h.n
from [hours] h
cross join calendar c
order by c.[Date], h.n;
The tally CTE creates rows with a zero based index.
The calendar CTE creates the dates between 1/1/2020 and today.
The hours CTE creates the hours from 0 through 23.
The final query creates a Cartesian Product of calendar and hours.
This is a query that generates the desired data. If the data is to be persisted in a table, then insert logic would need to be added to the final query.

cross join to get all dates and hours and avoid duplicate values

We have 2 tables:
sales
hourt (only 1 field (hourt) of numbers: 0 to 23)
The goal is to list all dates and all 24 hours for each day and group hours that have sales. For hours that do not have sales, zero will be shown.
This query cross joins the sales table with the hourt table and does list all dates and 24 hours. However, there are also many duplicate rows. How can we avoid the duplicates?
We're using Amazon Redshift (based on Postgres 8.0).
with h as (
SELECT
a.purchase_date,
CAST(DATE_PART("HOUR", AT_TIME_ZONE(AT_TIME_ZONE(CAST(a.purchase_date AS
DATETIME), "0:00"), "PST")) as INTEGER) AS Hour,
COUNT(a.quantity) AS QtyCount,
SUM(a.quantity) AS QtyTotal,
SUM((a.price) AS Price
FROM sales a
GROUP BY CAST(DATE_PART("HOUR",
AT_TIME_ZONE(AT_TIME_ZONE(CAST(a.purchase_date AS DATETIME), "0:00"),
"PST")) as INTEGER),
DATE_FORMAT(AT_TIME_ZONE(AT_TIME_ZONE(CAST(a.purchase_date AS DATETIME),
"0:00"), "PST"), "yyyy-MM-dd")
ORDER by a.purchase_date
),
hr as (
SELECT
CAST(hourt AS INTEGER) AS hourt
FROM hourt
),
joined as (
SELECT
purchase_date,
hourt,
QtyCount,
QtyTotal,
Price
FROM h
cross JOIN hr
)
SELECT *
FROM joined
Order by purchase_date,hourt
Sample Tables:
Before the cross join, query returned correct sales and grouped hours, as seen in the below table.
Desired results table:
Need to create a series of all the hour values and left join your data back to that. Comments inline explain the logic.
WITH data AS (-- Do the basic aggregation first
SELECT DATE_TRUNC('hour',a.purchase_date) purchase_hour --Truncate timestamp to the hour is simpler
,COUNT(a.quantity) AS QtyCount
,SUM(a.quantity) AS QtyTotal
,SUM((a.price) AS Price
FROM sales a
GROUP BY DATE_TRUNC('hour',a.purchase_date)
ORDER BY DATE_TRUNC('hour',a.purchase_date)
-- SELECT '2017-01-13 12:00:00'::TIMESTAMP purchase_hour, 1 qty_count, 1 qty_total, 119 price
-- UNION ALL SELECT '2017-01-13 15:00:00'::TIMESTAMP purchase_hour, 1 qty_count, 1 qty_total, 119 price
-- UNION ALL SELECT '2017-01-14 21:00:00'::TIMESTAMP purchase_hour, 1 qty_count, 1 qty_total, 119 price
)
,time_range AS (--Calculate the start and end **date** values
SELECT DATE_TRUNC('day',MIN(purchase_hour)) start_date
, DATE_TRUNC('day',MAX(purchase_hour))+1 end_date
FROM data
)
,hr AS (--Generate all hours between start and end
SELECT (SELECT start_date
FROM time_range
LIMIT 1) --Limit 1 so the optimizer knows it's not a correlated subquery
+ ((n-1) --Make the series start at zero so we don't miss the starting value
* INTERVAL '1 hour') AS "hour"
FROM (SELECT ROW_NUMBER() OVER () n
FROM stl_query --Can use any table here as long as it enough rows
LIMIT 100) series
WHERE "hour" < (SELECT end_date FROM time_range LIMIT 1)
)
--Use NVL to replace missing values with zeroes
SELECT hr.hour AS purchase_hour --Timestamp like `2017-01-13 12:00:00`
, NVL(data.qty_count, 0) AS qty_count
, NVL(data.qty_total, 0) AS qty_total
, NVL(data.price, 0) AS price
FROM hr
LEFT JOIN data
ON hr.hour = data.purchase_hour
ORDER BY hr.hour
;
I achieved the desired results by using Left Join (table A with table B) instead of Cross Join of these two tables:
Table A has all the dates and hours
Table B is the first part of the original query

Hits per day in Google Big Query

I am using Google Big Query to find hits per day. Here is my query,
SELECT COUNT(*) AS Key,
DATE(EventDateUtc) AS Value
FROM [myDataSet.myTable]
WHERE .....
GROUP BY Value
ORDER BY Value DESC
LIMIT 1000;
This is working fine but it ignores the date with 0 hits. I wanna include this. I cannot create temp table in Google Big Query. How to fix this.
Tested getting error Field 'day' not found.
SELECT COUNT(*) AS Key,
DATE(t.day) AS Value from (
select date(date_add(day, i, "DAY")) day
from (select '2015-05-01 00:00' day) a
cross join
(select
position(
split(
rpad('', datediff(CURRENT_TIMESTAMP(),'2015-05-01 00:00')*2, 'a,'))) i
from (select NULL)) b
) d
left join [sample_data.requests] t on d.day = t.day
GROUP BY Value
ORDER BY Value DESC
LIMIT 1000;
You can query data that exists in your tables, the query cannot guess which dates are missing from your table. This problem you need to handle either in your programming language, or you could join with a numbers table and generates the dates on the fly.
If you know the date range you have in your query, you can generate the days:
select date(date_add(day, i, "DAY")) day
from (select '2015-01-01' day) a
cross join
(select
position(
split(
rpad('', datediff('2015-01-15','2015-01-01')*2, 'a,'))) i
from (select NULL)) b;
Then you can join this result with your query table:
SELECT COUNT(*) AS Key,
DATE(t.day) AS Value from (...the.above.query.pasted.here...) d
left join [myDataSet.myTable] t on d.day = t.day
WHERE .....
GROUP BY Value
ORDER BY Value DESC
LIMIT 1000;

Summing up columns from two different tables

I have two different tables FirewallLog and ProxyLog. There is no relation between these two tables. They have four common fields :
LogTime ClientIP BytesSent BytesRec
I need to Calculate the total usage of a particular ClientIP for each day over a period of time (like last month) and display it like below:
Date TotalUsage
2/12 125
2/13 145
2/14 0
. .
. .
3/11 150
3/12 125
TotalUsage is SUM(FirewallLog.BytesSent + FirewallLog.BytesRec) + SUM(ProxyLog.BytesSent + ProxyLog.BytesRec) for that IP. I have to show Zero if there is no usage (no record) for that day.
I need to find the fastest solution to this problem. Any Ideas?
First, create a Calendar table. One that has, at the very least, an id column and a calendar_date column, and fill it with dates covering every day of every year you can ever be interested in . (You'll find that you'll add flags for weekends, bankholidays and all sorts of other useful meta-data about dates.)
Then you can LEFT JOIN on to that table, after combining your two tables with a UNION.
SELECT
CALENDAR.calendar_date,
JOINT_LOG.ClientIP,
ISNULL(SUM(JOINT_LOG.BytesSent + JOINT_LOG.BytesRec), 0) AS TotalBytes
FROM
CALENDAR
LEFT JOIN
(
SELECT LogTime, ClientIP, BytesSent, BytesRec FROM FirewallLog
UNION ALL
SELECT LogTime, ClientIP, BytesSent, BytesRec FROM ProxyLog
)
AS JOINT_LOG
ON JOINT_LOG.LogTime >= CALENDAR.calendar_date
AND JOINT_LOG.LogTime < CALENDAR.calendar_date+1
WHERE
CALENDAR.calendar_date >= #start_date
AND CALENDAR.calendar_date < #cease_date
GROUP BY
CALENDAR.calendar_date,
JOINT_LOG.ClientIP
SQL Server is very good at optimising this type of UNION ALL query. Assuming that you have appropriate indexes.
If you don't have a calendar table, you can create one using a recursive CTE:
declare #startdate date = '2013-02-01';
declare #enddate date = '2013-03-01';
with dates as (
select #startdate as thedate
union all
select dateadd(day, 1, thedate)
from dates
where thedate < #enddate
)
select driver.thedate, driver.ClientIP,
coalesce(fwl.FWBytes, 0) + coalesce(pl.PLBytes, 0) as TotalBytes
from (select d.thedate, fwl.ClientIP
from dates d cross join
(select distinct ClientIP from FirewallLog) fwl
) driver left outer join
(select cast(fwl.logtime as date) as thedate,
SUM(fwl.BytesSent + fwl.BytesRec) as FWBytes
from FirewallLog fwl
group by cast(fwl.logtime as date)
) fwl
on driver.thedate = fwl.thedate and driver.clientIP = fwl.ClientIP left outer join
(select cast(pl.logtime as date) as thedate,
SUM(pl.BytesSent + pl.BytesRec) as PLBytes
from ProxyLog pl
group by cast(pl.logtime as date)
) pl
on driver.thedate = pl.thedate and driver.ClientIP = pl.ClientIP
This uses a driver table that generates all the combinations of IP and date, which it then uses for joining to the summarized table. This formulation assumes that the "FirewallLog" contains all the "ClientIp"s of interest.
This also breaks out the two values, in case you also want to include them (to see which is contributing more bytes to the total, for instance).
I would recommend creating a Dates Lookup table if that is an option. Create the table once and then you can use it as often as needed. If not, you'll need to look into creating a Recursive CTE to act as the Dates table (easy enough -- look on stackoverflow for examples).
Select d.date,
results.ClientIp
Sum(results.bytes)
From YourDateLookupTable d
Left Join (
Select ClientIp, logtime, BytesSent + BytesRec bytes From FirewallLog
Union All
Select ClientIp, logtime, BytesSent + BytesRec bytes From ProxyLog
) results On d.date = results.logtime
Group By d.date,
results.ClientIp
This assumes the logtime and date data types are the same. If logtime is a date time, you'll need to convert it to a date.