SQL Server sampling large volume of data per hour - sql

Im using SQL Server 2016, and have a very large table containing millions of rows of data from different sources at irregular intervals over several years. The table cannot be altered, typical data looks like this -
Reading_ID Source Date Reading
========== ====== ==== =======
1 1 2023/01/01 00:04:00 7
2 1 2023/01/01 00:10:00 3
3 2 2023/01/01 00:15:00 8
4 1 2023/01/01 01:00:00 2
5 2 2023/01/01 01:03:00 15
The table has CONSTRAINT [PK_DATA_READINGS] PRIMARY KEY CLUSTERED ([Source] ASC, [Date] ASC). The SOURCE can be any number, its not fixed or known in advance. New sources can start at any time.
What I want to do is specify a date range and an interval in hours, then just get 1 reading from each source every X hours. i.e. in the above row 2 wouldn't be returned as its too close to row 1
I've tried something like the following -
DECLARE #Start_Date DATETIME = '2023/01/01 00:00:00',
#End_Date DATETIME = '2023/02/01 00:00:00',
#Interval_Hours = 4
;WITH HOURLY_DATA AS (
SELECT d.Source,
d.Date,
d.Reading,
ROW_NUMBER() OVER (PARTITION BY d.Source, DATEDIFF(HOUR, #Start_Date, d.DATE) / #Interval_Hours ORDER BY d.SOURCE, d.DATE) AS SOURCE_HOUR_ROW
FROM data_readings d
WHERE d.DATE BETWEEN #Start_Date AND #End_Date
)
SELECT h.Source,
h.Date,
h.Reading
FROM HOURLY_DATA h
WHERE h.SOURCE_HOUR_ROW = 1
But its still very slow to execute, sometimes taking 5 minutes or more to complete. I would like a faster way to get this data. I've looked at the Explain Plan, but cant see an obvious solution.
Thank you for looking.

You say the Source column has no table that it correlates to. This significantly worsens performance options, as it means you have no way of skipping through your (Source, Date) index by date.
Ideally you would have a table containing a list of possible Source values using a foreign-key relationship. There is no reason why you couldn't update this dynamically.
However, you can hack it with an indexed view.
CREATE VIEW dbo.vAllSources
WITH SCHEMABINDING
AS
SELECT
dr.Source,
COUNT_BIG(*) AS Count
FROM dbo.data_readings dr
GROUP BY
dr.Source;
CREATE UNIQUE CLUSTERED INDEX UX_AllSources ON AllSources (Source);
The server will efficiently maintain this index based off the original table.
Then you can do a simple join. Use the NOEXPAND hint to force it to use the index.
DECLARE #Start_Date DATETIME = '20230101 00:00:00',
#End_Date DATETIME = '20230201 00:00:00',
#Interval_Hours = 4;
WITH HOURLY_DATA AS (
SELECT
d.Source,
d.Date,
d.Reading,
ROW_NUMBER() OVER (PARTITION BY d.Source, DATEDIFF(HOUR, #Start_Date, d.DATE) / #Interval_Hours ORDER BY d.DATE) AS SOURCE_HOUR_ROW
FROM AllSources s WITH (NOEXPAND)
JOIN data_readings d
ON s.Source = d.Source
AND d.DATE BETWEEN #Start_Date AND #End_Date
)
SELECT h.Source,
h.Date,
h.Reading
FROM HOURLY_DATA h
WHERE h.SOURCE_HOUR_ROW = 1;
Note that BETWEEN on date values is generally not recommended, as it implies >= AND <=. You are far better of using a half-open interval:
AND d.DATE >= #Start_Date AND d.DATE < #End_Date
You should also use non-ambiguous date formats.

The slowness is caused by the volume of data in the CTE.
I found this solution which works faster How to sample records by time

Related

Combining WITH, UNION and OR in JOIN

I have a national database of all hospital records, and another national database of infection events. I am looking to extract all relevant hospital events for the infection, but I am struggling to find a way to optmise the query. OtherData is a proxy for another set of columns.
The infection data looks like this:
UniqueID
PatientNumber
HospitalNumber
Date
OtherData
14000000
1234
BAC
2022-01-27
DELTA
12007927
5412
HSA
2022-01-20
OMICRON
1
7862
UDO
2020-02-01
ALPHA
The hospital data looks like this:
EpisodeID
PatientNumber
HospitalNumber
StartDate
EndDate
OtherData
4
1234
2022-01-25
NA
ICU
987213
5412
2022-01-20
2022-01-27
DIED
3
BAC
2021-11-20
2022-01-20
DISCHARGED
3
BAC
2020-01-29
2022-02-10
DISCHARGED
The data can be missing lots of fields, and I have two identifiers (national and local) I can use to link the data. I query against both using UNION. But because these are National registers, and I'm dealing with Covid-19 data, we are talking about linking 10s of millions of records (on seperate servers). In order to minimise the amount of hospital data pulled in I am attempting to link between date ranges of the infection.
My query as it stands. It took 8 min pulling a single UniqueID when I tested it, and I have 10s of millions.
Im not sure if I should be using and AND (X OR Y) with the dates in the INNER JOIN or if I should have them seperated, and use two more UNIONs.
This data is ingested into R for further processing and analytics.
Help appreciated!
DECLARE #days AS INT = 28;
WITH
infections AS (
SELECT UniqueID
,PatientNumber
,HospitalNumber
,Date
,OtherData
FROM infections
),
link_tbl AS (
SELECT
i.UniqueID
,h.EpisodeID
,h.PatientNumber
,h.HospitalNumber
,h.StartDate
,h.EndDate
FROM infections i
INNER JOIN hospital h
ON i.NHSNumber = h.NHSNumber
AND (h.EndDate
BETWEEN CONVERT(date, DATEADD(DAY, -#days, i.Date))
AND CONVERT(date, DATEADD(DAY, #days, i.Date))
OR h.StartDate
BETWEEN CONVERT(date, DATEADD(DAY, -#days, i.Date))
AND CONVERT(date, DATEADD(DAY, #days, i.Date))
)
UNION
SELECT
i.UniqueID
,h.EpisodeID
,h.NHSNumber
,h.HospitalNumber
,h.StartDate
,h.EndDate
FROM infections i
INNER JOIN hospital h
ON i.HospitalNumber = h.HospitalNumber
AND (h.EndDate
BETWEEN CONVERT(date, DATEADD(DAY, -#days, i.Date))
AND CONVERT(date, DATEADD(DAY, #days, i.Date))
OR h.StartDate
BETWEEN CONVERT(date, DATEADD(DAY, -#days, i.Date))
AND CONVERT(date, DATEADD(DAY, #days, i.Date))
)
)
SELECT
hospital.allothervars* ---(these are named)
,infections.* ---named in query
FROM hospital
INNER JOIN link_tbl ON hospital.EpisodeID = link_tbl.EpisodeID
I deal with data on a similar scale with similar use cases. One thing I found is that filtering by an unindexed date is VERY slow. We got around it by finding the index for a specific date and using that as a subquery.
So as an example :
SELECT Id,
Date,
Value,
FROM Table
WHERE Id >= (SELECT MIN(Id) FROM Table WHERE Date = '01-01-1970')
rather than doing
SELECT Id,
Date,
Value,
FROM Table
WHERE Date >= '01-01-1970'
Indexing datetime data creates a pretty big file size, so most of the time it's not indexed. The SQL Execution engine can do a much better job sorting on primary keys rather than datetimes.
What works for me is generating a big list of the min and max IDs for a date range grouped by the day. Then whenever I need to do an ad hoc request or do something that has a bound StartDate but not an EndDate, you can just hard code the ID and do a greater than operator for your filter.

For each minute add row between ranges

I have table with date ranges
and i need to add rows between this ranges. I must granulate this table to minutes. How can i add this extra rows
The recursive CTE option from #MatthewBaker would only need minor changes to meet your needs.
WITH
by_minute
AS
(
SELECT *, datetime_from, minute_marker FROM your_table
UNION ALL
SELECT *, DATEADD(minute, 1, minute_marker) FROM by_minute WHERE DATEADD(minute, 1, minute_marker) < datetime_to
)
SELECT
*
FROM
by_minute
OPTION
(MAXRECURSION 0)
The OPTION (MAXRECURSION 0) allows SQL Server to keep recursively generating the minutes beyond the default of 100. Still, I would not recommend this if the intervals being generated are more than a few hundred minutes long (maybe up to one day [1440 minutes]).
In such a case the simpler approach would be to utilise a table of numbers, and simply join on to that.
An example for creating such a table could be : https://www.mssqltips.com/sqlservertip/4176/the-sql-server-numbers-table-explained--part-1/
From there, you just join on the number of row that you need...
SELECT
yourTable.*,
DATEADD(minute, Numbers.[Number], yourTable.datetime_from) AS minute_marker
FROM
yourTable
INNER JOIN
dbo.Numbers
ON Numbers.[Number] >= 0
AND Numbers.[Number] < DATEDIFF(minute, yourTable.datetime_from, yourTable.datetime_to)
Another recommendation I have is to NOT use 59th second to represent the end of a minute. What if you get data at 59.600 seconds? That's after then end of the minute but before the start of the new one? Instead use markers that are Inclusive Start and Exclusive End...
The first minute of 2012 = '2012-01-01 00:00:00.000' -> '2012-01-01 00:01:00.000'
The final minute of 2012 = '2012-12-31 23:59:00.000' -> '2013-01-01 00:00:00.000'
With such a structure you only ever need my_point_in_time >= start AND my_point_in_time < end, and you never need worry about the precision of the datatypes being used.
(It also matches human natural language. When we say things like between 1 and 2 we most often mean >= 1 AND < 2.)
If you use the following:
WITH cte
AS (SELECT CAST('2017-01-01 00:00:00' AS DATETIME) AS startTime
UNION ALL
SELECT DATEADD(MINUTE, 1, startTime)
FROM cte
WHERE startTime < '2017-01-02 00:00:00'
)
SELECT *
FROM cte
OPTION (MAXRECURSION 0)
It will give you a minute by minute result. Substitute in the range you want. You can then use that as a basis to write an insert. Iterative CTE's aren't the most efficient, but probably the easiest

trying to find the maximum number of occurrences over time T-SQL

I have data recording the StartDateTime and EndDateTime (both DATETIME2) of a process for all of the year 2013.
My task is to find the maximum amount of times the process was being ran at any specific time throughout the year.
I have wrote some code to check every minute/second how many processes were running at the specific time, but this takes a very long time and would be impossible to let it run for the whole year.
Here is the code (in this case check every minute for the date 25/10/2013)
CREATE TABLE dbo.#Hit
(
ID INT IDENTITY (1,1) PRIMARY KEY,
Moment DATETIME2,
COUNT INT
)
DECLARE #moment DATETIME2
SET #moment = '2013-10-24 00:00:00'
WHILE #moment < '2013-10-25'
BEGIN
INSERT INTO #Hit ( Moment, COUNT )
SELECT #moment, COUNT(*)
FROM dbo.tblProcessTimeLog
WHERE ProcessFK IN (25)
AND #moment BETWEEN StartDateTime AND EndDateTime
AND DelInd = 0
PRINT #moment
SET #moment = DATEADD(MINute,1,#moment)
END
SELECT * FROM #Hit
ORDER BY COUNT DESC
Can anyone think how i could get a similar result (I just need the maximum amount of processes being run at any given time), but for all year?
Thanks
DECLARE #d DATETIME = '20130101'; -- the first day of the year you care about
;WITH m(m) AS
( -- all the minutes in a day
SELECT TOP (1440) ROW_NUMBER() OVER (ORDER BY number) - 1
FROM master..spt_values
),
d(d) AS
( -- all the days in *that* year (accounts for leap years vs. hard-coding 365)
SELECT TOP (DATEDIFF(DAY, #d, DATEADD(YEAR, 1, #d))) DATEADD(DAY, number, #d)
FROM master..spt_values WHERE type = N'P' ORDER BY number
),
x AS
( -- all the minutes in *that* year
SELECT moment = DATEADD(MINUTE, m.m, d.d) FROM m CROSS JOIN d
)
SELECT TOP (1) WITH TIES -- in case more than one at the top
x.moment, [COUNT] = COUNT(l.ProcessFK)
FROM x
INNER JOIN dbo.tblProcessTimeLog AS l
ON x.moment >= l.StartDateTime
AND x.moment <= l.EndDateTime
WHERE l.ProcessFK = 25 AND l.DelInd = 0
GROUP BY x.moment
ORDER BY [COUNT] DESC;
See this post for why I don't think you should use BETWEEN for range queries, even in cases where it does semantically do what you want.
Create a table T whose rows represent some time segments.
This table could well be a temporary table (depending on your case).
Say:
row 1 - [from=00:00:00, to=00:00:01)
row 2 - [from=00:00:01, to=00:00:02)
row 3 - [from=00:00:02, to=00:00:03)
and so on.
Then just join from your main table
(tblProcessTimeLog, I think) to this table
based on the datetime values recorded in
tblProcessTimeLog.
A year has just about half million minutes
so it is not that many rows to store in T.
I recently pulled some code from SO trying to solve the 'island and gaps' problem, and the algorithm for that should help you solve your problem.
The idea is that you want to find the point in time that has the most started processes, much like figuring out the deepest nesting of parenthesis in an expression:
( ( ( ) ( ( ( (deepest here, 6)))))
This sql will produce this result for you (I included a temp table with sample data):
/*
CREATE TABLE #tblProcessTimeLog
(
StartDateTime DATETIME2,
EndDateTime DATETIME2
)
-- delete from #tblProcessTimeLog
INSERT INTO #tblProcessTimeLog (StartDateTime, EndDateTime)
Values ('1/1/2012', '1/6/2012'),
('1/2/2012', '1/6/2012'),
('1/3/2012', '1/6/2012'),
('1/4/2012', '1/6/2012'),
('1/5/2012', '1/7/2012'),
('1/6/2012', '1/8/2012'),
('1/6/2012', '1/10/2012'),
('1/6/2012', '1/11/2012'),
('1/10/2012', '1/12/2012'),
('1/15/2012', '1/16/2012')
;
*/
with cteProcessGroups (EventDate, GroupId) as
(
select EVENT_DATE, (E.START_ORDINAL - E.OVERALL_ORDINAL) GROUP_ID
FROM
(
select EVENT_DATE, EVENT_TYPE,
MAX(START_ORDINAL) OVER (ORDER BY EVENT_DATE, EVENT_TYPE ROWS UNBOUNDED PRECEDING) as START_ORDINAL,
ROW_NUMBER() OVER (ORDER BY EVENT_DATE, EVENT_TYPE) AS OVERALL_ORDINAL
from
(
Select StartDateTime AS EVENT_DATE, 1 as EVENT_TYPE, ROW_NUMBER() OVER (ORDER BY StartDateTime) as START_ORDINAL
from #tblProcessTimeLog
UNION ALL
select EndDateTime, 0 as EVENT_TYPE, NULL
FROM #tblProcessTimeLog
) RAWDATA
) E
)
select Max(EventDate) as EventDate, count(GroupId) as OpenProcesses
from cteProcessGroups
group by (GroupId)
order by COUNT(GroupId) desc
Results:
EventDate OpenProcesses
2012-01-05 00:00:00.0000000 5
2012-01-06 00:00:00.0000000 4
2012-01-15 00:00:00.0000000 2
2012-01-10 00:00:00.0000000 2
2012-01-08 00:00:00.0000000 1
2012-01-07 00:00:00.0000000 1
2012-01-11 00:00:00.0000000 1
2012-01-06 00:00:00.0000000 1
2012-01-06 00:00:00.0000000 1
2012-01-06 00:00:00.0000000 1
2012-01-16 00:00:00.0000000 1
Note that the 'in-between' rows don't give anything meaningful. Basically this output is only tuned to tell you when the most activity was. Looking at the other rows in the out put, there wasn't just 1 process running on 1/8 (there was actually 3). But the way this code works is that by grouping the processes that are concurrent together in a group, you can count the number of simultaneous processes. The date returned is when the max concurrent processes began. It doesn't tell you how long they were going on for, but you can solve that with an additional query. (once you know the date the most was ocurring, you can find out the specific process IDs by using a BETWEEN statement on the date.)
Hope this helps.

Select data from SQL DB per day

I have a table with order information in an E-commerce store. Schema looks like this:
[Orders]
Id|SubTotal|TaxAmount|ShippingAmount|DateCreated
This table does only contain data for every Order. So if a day goes by without any orders, no sales data is there for that day.
I would like to select subtotal-per-day for the last 30 days, including those days with no sales.
The resultset would look like this:
Date | SalesSum
2009-08-01 | 15235
2009-08-02 | 0
2009-08-03 | 340
2009-08-04 | 0
...
Doing this, only gives me data for those days with orders:
select DateCreated as Date, sum(ordersubtotal) as SalesSum
from Orders
group by DateCreated
You could create a table called Dates, and select from that table and join the Orders table. But I really want to avoid that, because it doesn't work good enough when dealing with different time zones and things...
Please don't laugh. SQL is not my kind of thing... :)
Create a function that can generate a date table as follows:
(stolen from http://www.codeproject.com/KB/database/GenerateDateTable.aspx)
Create Function dbo.fnDateTable
(
#StartDate datetime,
#EndDate datetime,
#DayPart char(5) -- support 'day','month','year','hour', default 'day'
)
Returns #Result Table
(
[Date] datetime
)
As
Begin
Declare #CurrentDate datetime
Set #CurrentDate=#StartDate
While #CurrentDate<=#EndDate
Begin
Insert Into #Result Values (#CurrentDate)
Select #CurrentDate=
Case
When #DayPart='year' Then DateAdd(yy,1,#CurrentDate)
When #DayPart='month' Then DateAdd(mm,1,#CurrentDate)
When #DayPart='hour' Then DateAdd(hh,1,#CurrentDate)
Else
DateAdd(dd,1,#CurrentDate)
End
End
Return
End
Then, join against that table
SELECT dates.Date as Date, sum(SubTotal+TaxAmount+ShippingAmount)
FROM [fnDateTable] (dateadd("m",-1,CONVERT(VARCHAR(10),GETDATE(),111)),CONVERT(VARCHAR(10),GETDATE(),111),'day') dates
LEFT JOIN Orders
ON dates.Date = DateCreated
GROUP BY dates.Date
declare #oldest_date datetime
declare #daily_sum numeric(18,2)
declare #temp table(
sales_date datetime,
sales_sum numeric(18,2)
)
select #oldest_date = dateadd(day,-30,getdate())
while #oldest_date <= getdate()
begin
set #daily_sum = (select sum(SubTotal) from SalesTable where DateCreated = #oldest_date)
insert into #temp(sales_date, sales_sum) values(#oldest_date, #daily_sum)
set #oldest_date = dateadd(day,1,#oldest_date)
end
select * from #temp
OK - I missed that 'last 30 days' part. The bit above, while not as clean, IMHO, as the date table, should work. Another variant would be to use the while loop to fill a temp table just with the last 30 days and do a left outer join with the result of my original query.
including those days with no sales.
That's the difficult part. I don't think the first answer will help you with that. I did something similar to this with a separate date table.
You can find the directions on how to do so here:
Date Table
I have a Log table table with LogID an index which i never delete any records. it has index from 1 to ~10000000. Using this table I can write
select
s.ddate, SUM(isnull(o.SubTotal,0))
from
(
select
cast(datediff(d,LogID,getdate()) as datetime) AS ddate
from
Log
where
LogID <31
) s right join orders o on o.orderdate = s.ddate
group by s.ddate
I actually did this today. We also got a e-commerce application. I don't want to fill our database with "useless" dates. I just do the group by and create all the days for the last N days in Java, and peer them with the date/sales results from the database.
Where is this ultimately going to end up? I ask only because it may be easier to fill in the empty days with whatever program is going to deal with the data instead of trying to get it done in SQL.
SQL is a wonderful language, and it is capable of a great many things, but sometimes you're just better off working the finer points of the data in the program instead.
(Revised a bit--I hit enter too soon)
I started poking at this, and as it hits some pretty tricky SQL concepts it quickly grew into the following monster. If feasible, you might be better off adapting THEn's solution; or, like many others advise, using application code to fill in the gaps could be preferrable.
-- A temp table holding the 30 dates that you want to check
DECLARE #Foo Table (Date smalldatetime not null)
-- Populate the table using a common "tally table" methodology (I got this from SQL Server magazine long ago)
;WITH
L0 AS (SELECT 1 AS C UNION ALL SELECT 1), --2 rows
L1 AS (SELECT 1 AS C FROM L0 AS A, L0 AS B),--4 rows
L2 AS (SELECT 1 AS C FROM L1 AS A, L1 AS B),--16 rows
L3 AS (SELECT 1 AS C FROM L2 AS A, L2 AS B),--256 rows
Tally AS (SELECT ROW_NUMBER() OVER(ORDER BY C) AS Number FROM L3)
INSERT #Foo (Date)
select dateadd(dd, datediff(dd, 0, dateadd(dd, -number + 1, getdate())), 0)
from Tally
where Number < 31
Step 1 is to build a temp table containint the 30 dates that you are concerned with. That abstract wierdness is about the fastest way known to build a table of consecutive integers; add a few more subqueries, and you can populate millions or more in mere seconds. I take the first 30, and use dateadd and the current date/time to convert them into dates. If you already have a "fixed" table that has 1-30, you can use that and skip the CTE entirely (by replacing table "Tally" with your table).
The outer two date function calls remove the time portion of the generated string.
(Note that I assume that your order date also has no time portion -- otherwise you've got another common problem to resolve.)
For testing purposes I built table #Orders, and this gets you the rest:
SELECT f.Date, sum(ordersubtotal) as SalesSum
from #Foo f
left outer join #Orders o
on o.DateCreated = f.Date
group by f.Date
I created the Function DateTable as JamesMLV pointed out to me.
And then the SQL looks like this:
SELECT dates.date, ISNULL(SUM(ordersubtotal), 0) as Sales FROM [dbo].[DateTable] ('2009-08-01','2009-08-31','day') dates
LEFT JOIN Orders ON CONVERT(VARCHAR(10),Orders.datecreated, 111) = dates.date
group by dates.date
SELECT DateCreated,
SUM(SubTotal) AS SalesSum
FROM Orders
GROUP BY DateCreated

SQL for counting events by date

I feel like I've seen this question asked before, but neither the SO search nor google is helping me... maybe I just don't know how to phrase the question. I need to count the number of events (in this case, logins) per day over a given time span so that I can make a graph of website usage. The query I have so far is this:
select
count(userid) as numlogins,
count(distinct userid) as numusers,
convert(varchar, entryts, 101) as date
from
usagelog
group by
convert(varchar, entryts, 101)
This does most of what I need (I get a row per date as the output containing the total number of logins and the number of unique users on that date). The problem is that if no one logs in on a given date, there will not be a row in the dataset for that date. I want it to add in rows indicating zero logins for those dates. There are two approaches I can think of for solving this, and neither strikes me as very elegant.
Add a column to the result set that lists the number of days between the start of the period and the date of the current row. When I'm building my chart output, I'll keep track of this value and if the next row is not equal to the current row plus one, insert zeros into the chart for each of the missing days.
Create a "date" table that has all the dates in the period of interest and outer join against it. Sadly, the system I'm working on already has a table for this purpose that contains a row for every date far into the future... I don't like that, and I'd prefer to avoid using it, especially since that table is intended for another module of the system and would thus introduce a dependency on what I'm developing currently.
Any better solutions or hints at better search terms for google? Thanks.
Frankly, I'd do this programmatically when building the final output. You're essentially trying to read something from the database which is not there (data for days that have no data). SQL isn't really meant for that sort of thing.
If you really want to do that, though, a "date" table seems your best option. To make it a bit nicer, you could generate it on the fly, using i.e. your DB's date functions and a derived table.
I had to do exactly the same thing recently. This is how I did it in T-SQL (
YMMV on speed, but I've found it performant enough over a coupla million rows of event data):
DECLARE #DaysTable TABLE ( [Year] INT, [Day] INT )
DECLARE #StartDate DATETIME
SET #StartDate = whatever
WHILE (#StartDate <= GETDATE())
BEGIN
INSERT INTO #DaysTable ( [Year], [Day] )
SELECT DATEPART(YEAR, #StartDate), DATEPART(DAYOFYEAR, #StartDate)
SELECT #StartDate = DATEADD(DAY, 1, #StartDate)
END
-- This gives me a table of all days since whenever
-- you could select #StartDate as the minimum date of your usage log)
SELECT days.Year, days.Day, events.NumEvents
FROM #DaysTable AS days
LEFT JOIN (
SELECT
COUNT(*) AS NumEvents
DATEPART(YEAR, LogDate) AS [Year],
DATEPART(DAYOFYEAR, LogDate) AS [Day]
FROM LogData
GROUP BY
DATEPART(YEAR, LogDate),
DATEPART(DAYOFYEAR, LogDate)
) AS events ON days.Year = events.Year AND days.Day = events.Day
Create a memory table (a table variable) where you insert your date ranges, then outer join the logins table against it. Group by your start date, then you can perform your aggregations and calculations.
The strategy I normally use is to UNION with the opposite of the query, generally a query that retrieves data for rows that don't exist.
If I wanted to get the average mark for a course, but some courses weren't taken by any students, I'd need to UNION with those not taken by anyone to display a row for every class:
SELECT AVG(mark), course FROM `marks`
UNION
SELECT NULL, course FROM courses WHERE course NOT IN
(SELECT course FROM marks)
Your query will be more complex but the same principle should apply. You may indeed need a table of dates for your second query
Option 1
You can create a temp table and insert dates with the range and do a left outer join with the usagelog
Option 2
You can programmetically insert the missing dates while evaluating the result set to produce the final output
WITH q(n) AS
(
SELECT 0
UNION ALL
SELECT n + 1
FROM q
WHERE n < 99
),
qq(n) AS
(
SELECT 0
UNION ALL
SELECT n + 1
FROM q
WHERE n < 99
),
dates AS
(
SELECT q.n * 100 + qq.n AS ndate
FROM q, qq
)
SELECT COUNT(userid) as numlogins,
COUNT(DISTINCT userid) as numusers,
CAST('2000-01-01' + ndate AS DATETIME) as date
FROM dates
LEFT JOIN
usagelog
ON entryts >= CAST('2000-01-01' AS DATETIME) + ndate
AND entryts < CAST('2000-01-01' AS DATETIME) + ndate + 1
GROUP BY
ndate
This will select up to 10,000 dates constructed on the fly, that should be enough for 30 years.
SQL Server has a limitation of 100 recursions per CTE, that's why the inner queries can return up to 100 rows each.
If you need more than 10,000, just add a third CTE qqq(n) and cross-join with it in dates.