SQL for counting events by date - sql

I feel like I've seen this question asked before, but neither the SO search nor google is helping me... maybe I just don't know how to phrase the question. I need to count the number of events (in this case, logins) per day over a given time span so that I can make a graph of website usage. The query I have so far is this:
select
count(userid) as numlogins,
count(distinct userid) as numusers,
convert(varchar, entryts, 101) as date
from
usagelog
group by
convert(varchar, entryts, 101)
This does most of what I need (I get a row per date as the output containing the total number of logins and the number of unique users on that date). The problem is that if no one logs in on a given date, there will not be a row in the dataset for that date. I want it to add in rows indicating zero logins for those dates. There are two approaches I can think of for solving this, and neither strikes me as very elegant.
Add a column to the result set that lists the number of days between the start of the period and the date of the current row. When I'm building my chart output, I'll keep track of this value and if the next row is not equal to the current row plus one, insert zeros into the chart for each of the missing days.
Create a "date" table that has all the dates in the period of interest and outer join against it. Sadly, the system I'm working on already has a table for this purpose that contains a row for every date far into the future... I don't like that, and I'd prefer to avoid using it, especially since that table is intended for another module of the system and would thus introduce a dependency on what I'm developing currently.
Any better solutions or hints at better search terms for google? Thanks.

Frankly, I'd do this programmatically when building the final output. You're essentially trying to read something from the database which is not there (data for days that have no data). SQL isn't really meant for that sort of thing.
If you really want to do that, though, a "date" table seems your best option. To make it a bit nicer, you could generate it on the fly, using i.e. your DB's date functions and a derived table.

I had to do exactly the same thing recently. This is how I did it in T-SQL (
YMMV on speed, but I've found it performant enough over a coupla million rows of event data):
DECLARE #DaysTable TABLE ( [Year] INT, [Day] INT )
DECLARE #StartDate DATETIME
SET #StartDate = whatever
WHILE (#StartDate <= GETDATE())
BEGIN
INSERT INTO #DaysTable ( [Year], [Day] )
SELECT DATEPART(YEAR, #StartDate), DATEPART(DAYOFYEAR, #StartDate)
SELECT #StartDate = DATEADD(DAY, 1, #StartDate)
END
-- This gives me a table of all days since whenever
-- you could select #StartDate as the minimum date of your usage log)
SELECT days.Year, days.Day, events.NumEvents
FROM #DaysTable AS days
LEFT JOIN (
SELECT
COUNT(*) AS NumEvents
DATEPART(YEAR, LogDate) AS [Year],
DATEPART(DAYOFYEAR, LogDate) AS [Day]
FROM LogData
GROUP BY
DATEPART(YEAR, LogDate),
DATEPART(DAYOFYEAR, LogDate)
) AS events ON days.Year = events.Year AND days.Day = events.Day

Create a memory table (a table variable) where you insert your date ranges, then outer join the logins table against it. Group by your start date, then you can perform your aggregations and calculations.

The strategy I normally use is to UNION with the opposite of the query, generally a query that retrieves data for rows that don't exist.
If I wanted to get the average mark for a course, but some courses weren't taken by any students, I'd need to UNION with those not taken by anyone to display a row for every class:
SELECT AVG(mark), course FROM `marks`
UNION
SELECT NULL, course FROM courses WHERE course NOT IN
(SELECT course FROM marks)
Your query will be more complex but the same principle should apply. You may indeed need a table of dates for your second query

Option 1
You can create a temp table and insert dates with the range and do a left outer join with the usagelog
Option 2
You can programmetically insert the missing dates while evaluating the result set to produce the final output

WITH q(n) AS
(
SELECT 0
UNION ALL
SELECT n + 1
FROM q
WHERE n < 99
),
qq(n) AS
(
SELECT 0
UNION ALL
SELECT n + 1
FROM q
WHERE n < 99
),
dates AS
(
SELECT q.n * 100 + qq.n AS ndate
FROM q, qq
)
SELECT COUNT(userid) as numlogins,
COUNT(DISTINCT userid) as numusers,
CAST('2000-01-01' + ndate AS DATETIME) as date
FROM dates
LEFT JOIN
usagelog
ON entryts >= CAST('2000-01-01' AS DATETIME) + ndate
AND entryts < CAST('2000-01-01' AS DATETIME) + ndate + 1
GROUP BY
ndate
This will select up to 10,000 dates constructed on the fly, that should be enough for 30 years.
SQL Server has a limitation of 100 recursions per CTE, that's why the inner queries can return up to 100 rows each.
If you need more than 10,000, just add a third CTE qqq(n) and cross-join with it in dates.

Related

Match between tables with New, Not New

I am still learning TSQL at the moment and im new to here so forgive me if Ive not done this right.
I have a table that each day loads new days data. Each day that loads has a report date for the previous day.
I want to get yesterdays data (eg - 17/09/2019) from the table, and I want to look at the data in the same table from the day before that (eg - 16/09/2019) and I want to run a check for the reference number and if the Reference number appears on the day before then I want it to say Not New, and if it does match to the day before then I want it to say New.
The columns I have is :
ReferenceNumber, ReportData, NewAppt
NewAppt column will be where it put the outcome of New/Not New
Something like this should work:
WITH Yesterday AS (
SELECT DISTINCT
ReferenceNumber,
CONVERT(DATE, ReportDate) AS ReportDate
FROM
MyTable
WHERE
CONVERT(DATE, ReportDate) = CONVERT(DATE, DATEADD(DAY, -1, GETDATE())),
DayBeforeYesterday AS (
SELECT DISTINCT
ReferenceNumber
FROM
MyTable
WHERE
CONVERT(DATE, ReportDate) = CONVERT(DATE, DATEADD(DAY, -2, GETDATE()))
SELECT
y.ReferenceNumber,
y.ReportDate,
CASE
WHEN x.ReferenceNumber IS NOT NULL THEN 0
ELSE 1
END AS NewAppointment
FROM
Yesterday y
LEFT JOIN DayBeforeYesterday x ON x.ReferenceNumber = y.ReferenceNumber;
Make a list of all the DISTINCT reference numbers from each day, and then join them into one big list, with the logic to see if there was a reference number yesterday that was also in the list from the day before yesterday.
I suppose your column ReportData is some sort of 'date' type and contains only the date (no time).
Furthermore, for each date, there should be at most 1 record for a specific ReferenceNumber.
In that case, try this:
SELECT t1.ReferenceNumber,
t1.ReportData,
CASE
WHEN t2.ReferenceNumber IS NULL THEN 'New'
ELSE 'Not New'
END AS NewAppt
FROM my_table t1
LEFT OUTER JOIN my_table t2
ON t1.ReferenceNumber = t2.ReferenceNumber
AND t2.ReportData = DATEADD(day, -1, t1.ReportData)
WHERE t1.ReportData = '2019-09-17';
Here's an approach using LAG which removes the need to join to the same table several times and instead just checks the preceding row for that Reference Number.
Note that in my interpretation of your request if a Reference Number disappears for a day and then returns then it's flagged as new. You can adapt the query to simply check if the number has appeared at any point in the past if that's not what you need.
CREATE TABLE #TestData (ReferenceNumber int,Reportdata date)
INSERT INTO #TestData
VALUES (1,'2019-01-16'),(1,'2019-01-17'),(1,'2019-01-18'),(2,'2019-01-18'),(3,'2019-01-17'),(3,'2019-01-18'),(4,'2019-01-17')
SELECT
ReferenceNumber
,ReportData
,IIF(
LAG(ReportData) OVER(PARTITION BY ReferenceNumber ORDER BY ReportData)
= dateadd(day,-1,ReportData)
,'Not New'
,'New'
) AS NewAppt
FROM #TestData

How to get daily open inventory with a dynamic end date without storing results in an aggregate table?

I'm looking to see if there is a way to get the total daily inventory for open items in the past few months. Basically, each record has a start date and an end date. The start date is always the same. The end date will be null until it has been processed. Once processed, it is updated with a process date. Getting one day is fine, but I need to get the total volume, everyday, for a the last few months.
My current method of doing this is putting the results in an aggregate table. I can run the results one time through a while loop, then each day run whatever open volume there is from a stored procedure. This method works, but seems messy.
DECLARE #D AS DATE = '04/01/2019'
WHILE #D <= CAST(GETDATE() AS DATE)
BEGIN
INSERT INTO DBO.OPEN_INVENTORY
SELECT
#D OPEN_DATE
,COUNT(1) OPEN_VOLUME
FROM
DBO.INVENTORY_RECORDS
WHERE
#D BETWEEN START_DATE AND ISNULL(END_DATE,'12/31/2199')
SET #D = DATEADD(D,+1,#D)
END
I would like to reproduce these results without having to store the volumes into an aggregate table. Is there a way to accomplish this in a single select?
Yes, the best way would be to use what's known as a "Tally Table". They are extremely quick are building large sets of sequential data, and unlike a WHILE, CURSOR or rCTE, aren't recursive.
This is a big of a stab in the dark, as I have no sample data, but I think this is what you're after.
DECLARE #D AS DATE = '20190104';
WITH N AS(
SELECT N
FROM (VALUES(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)) N(N)),
Tally AS(
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) -1 AS I
FROM N N1, N N2, N N3), --1000 rows should be enough?
Dates AS(
SELECT DATEADD(DAY, T.I, #D) AS CalendarDate
FROM Tally T
WHERE DATEADD(DAY, T.I, #D) <= GETDATE())
SELECT D.CalendarDate,
COUNT(IR.YourIDColumn) AS OPEN_VOLUMNE
FROM Dates D
LEFT JOIN DBO.INVENTORY_RECORDS IR ON D.Date >= IR.START_DATE
AND (D.Date <= IR.END_DATE OR IR.END_DATE IS NULL)
GROUP BY D.CalendarDate;
If not, try to troubleshoot it yourself, and then supply sample and expected results if not.

Summing up columns from two different tables

I have two different tables FirewallLog and ProxyLog. There is no relation between these two tables. They have four common fields :
LogTime ClientIP BytesSent BytesRec
I need to Calculate the total usage of a particular ClientIP for each day over a period of time (like last month) and display it like below:
Date TotalUsage
2/12 125
2/13 145
2/14 0
. .
. .
3/11 150
3/12 125
TotalUsage is SUM(FirewallLog.BytesSent + FirewallLog.BytesRec) + SUM(ProxyLog.BytesSent + ProxyLog.BytesRec) for that IP. I have to show Zero if there is no usage (no record) for that day.
I need to find the fastest solution to this problem. Any Ideas?
First, create a Calendar table. One that has, at the very least, an id column and a calendar_date column, and fill it with dates covering every day of every year you can ever be interested in . (You'll find that you'll add flags for weekends, bankholidays and all sorts of other useful meta-data about dates.)
Then you can LEFT JOIN on to that table, after combining your two tables with a UNION.
SELECT
CALENDAR.calendar_date,
JOINT_LOG.ClientIP,
ISNULL(SUM(JOINT_LOG.BytesSent + JOINT_LOG.BytesRec), 0) AS TotalBytes
FROM
CALENDAR
LEFT JOIN
(
SELECT LogTime, ClientIP, BytesSent, BytesRec FROM FirewallLog
UNION ALL
SELECT LogTime, ClientIP, BytesSent, BytesRec FROM ProxyLog
)
AS JOINT_LOG
ON JOINT_LOG.LogTime >= CALENDAR.calendar_date
AND JOINT_LOG.LogTime < CALENDAR.calendar_date+1
WHERE
CALENDAR.calendar_date >= #start_date
AND CALENDAR.calendar_date < #cease_date
GROUP BY
CALENDAR.calendar_date,
JOINT_LOG.ClientIP
SQL Server is very good at optimising this type of UNION ALL query. Assuming that you have appropriate indexes.
If you don't have a calendar table, you can create one using a recursive CTE:
declare #startdate date = '2013-02-01';
declare #enddate date = '2013-03-01';
with dates as (
select #startdate as thedate
union all
select dateadd(day, 1, thedate)
from dates
where thedate < #enddate
)
select driver.thedate, driver.ClientIP,
coalesce(fwl.FWBytes, 0) + coalesce(pl.PLBytes, 0) as TotalBytes
from (select d.thedate, fwl.ClientIP
from dates d cross join
(select distinct ClientIP from FirewallLog) fwl
) driver left outer join
(select cast(fwl.logtime as date) as thedate,
SUM(fwl.BytesSent + fwl.BytesRec) as FWBytes
from FirewallLog fwl
group by cast(fwl.logtime as date)
) fwl
on driver.thedate = fwl.thedate and driver.clientIP = fwl.ClientIP left outer join
(select cast(pl.logtime as date) as thedate,
SUM(pl.BytesSent + pl.BytesRec) as PLBytes
from ProxyLog pl
group by cast(pl.logtime as date)
) pl
on driver.thedate = pl.thedate and driver.ClientIP = pl.ClientIP
This uses a driver table that generates all the combinations of IP and date, which it then uses for joining to the summarized table. This formulation assumes that the "FirewallLog" contains all the "ClientIp"s of interest.
This also breaks out the two values, in case you also want to include them (to see which is contributing more bytes to the total, for instance).
I would recommend creating a Dates Lookup table if that is an option. Create the table once and then you can use it as often as needed. If not, you'll need to look into creating a Recursive CTE to act as the Dates table (easy enough -- look on stackoverflow for examples).
Select d.date,
results.ClientIp
Sum(results.bytes)
From YourDateLookupTable d
Left Join (
Select ClientIp, logtime, BytesSent + BytesRec bytes From FirewallLog
Union All
Select ClientIp, logtime, BytesSent + BytesRec bytes From ProxyLog
) results On d.date = results.logtime
Group By d.date,
results.ClientIp
This assumes the logtime and date data types are the same. If logtime is a date time, you'll need to convert it to a date.

Can we do partition date wise?

I am new to the concept of Partition. I know horizontal partition but can we do partition date - wise?
in my project I want that whenever we enter in new-year, partition should be created. Can anyone explain how to do this? I am working on ERP sw and it has data of past year and I need partition on year wise.(for example APR-2011 to MAR-2012 is a year)
I think what you are looking for is something like this:
DECLARE #referenceDate datetime = '3/1/2011'
SELECT
[sub].[Period] + 1 AS [PeriodNr],
YEAR(DATEADD(YEAR, [sub].[Period], #referenceDate)) AS [PeriodStartedIn],
COUNT(*) AS [NumberOfRecords],
SUM([sub].[Value]) AS [TotalValue]
FROM
(
SELECT
*,
FLOOR(DATEDIFF(MONTH, #referenceDate, [timestamp]) / 12.0) AS [Period]
FROM [erp]
WHERE [timestamp] >= #referenceDate
) AS [sub]
GROUP BY [sub].[Period]
ORDER BY [sub].[Period] ASC
The fiddle is found here.
When you say partitioning I am curious if you mean in a windowed function or just grouping data in general. Let me show you two ways to aggregate data with partitioning it in a self demonstrating example:
declare #Orders table( id int identity, dt date, counts int)
insert into #Orders values ('1-1-12', 2),('1-1-12', 3),('1-18-12', 1),('2-11-12', 5),('3-1-12', 2),('6-1-12', 8),('10-1-12', 2),('1-13-13', 8)
-- To do days I need to do a group by
select
dt as DayDate
, SUM(counts) as sums
from #Orders
group by dt
-- To do months I need to group differently
select
DATEADD(month, datediff(month, 0, dt), 0) as MonthDate
-- above is a grouping trick basically stating count from 1/1/1900 the number of months of difference from my date field.
--This will always yield the current first day of the month of a date field
, SUM(counts) as sums
from #Orders
group by DATEADD(month, datediff(month, 0, dt), 0)
-- well that is great but what if I want to group different ways all at once?
-- why windowed functions rock:
select
dt
, counts
, SUM(counts) over(partition by DATEADD(year, datediff(year, 0, dt), 0)) as YearPartitioning
, SUM(counts) over(partition by DATEADD(month, datediff(month, 0, dt), 0)) as MonthPartitioning
-- expression above will work for year, month, day, minute, etc. You just need to add it to both the dateadd and datediff functions
, SUM(counts) over(partition by dt) as DayPartitioning
from #Orders
The important concepts on grouping is the traditional group by clause which you MUST LIST that which is NOT performing a math operation on as a pivot to do work on. So in my first select I just chose date and then said sum(counts). It then saw on 1-1-12 it had two values so it added both of them and on everything else it added them individually. On the second select method I perform a trick on the date field to make it transform to the first day of the month. Now this is great but I may want all of this at once.
Windowed functions do groupings inline, meaning they don't need a grouping clause as that is what the over() portion is doing. It however may repeat the values since you are not limiting your dataset. This means that if you look at the third column of the third select 'YearPartitioning' it repeats the number 23 seven times. WHy? Well because you never told the statement to do any grouping outside the function so it is showing every row. The number 23 will occur as long as the expression is true that the year is the same for all values. Just remember this when selecting from a windowed expression.

Select data from SQL DB per day

I have a table with order information in an E-commerce store. Schema looks like this:
[Orders]
Id|SubTotal|TaxAmount|ShippingAmount|DateCreated
This table does only contain data for every Order. So if a day goes by without any orders, no sales data is there for that day.
I would like to select subtotal-per-day for the last 30 days, including those days with no sales.
The resultset would look like this:
Date | SalesSum
2009-08-01 | 15235
2009-08-02 | 0
2009-08-03 | 340
2009-08-04 | 0
...
Doing this, only gives me data for those days with orders:
select DateCreated as Date, sum(ordersubtotal) as SalesSum
from Orders
group by DateCreated
You could create a table called Dates, and select from that table and join the Orders table. But I really want to avoid that, because it doesn't work good enough when dealing with different time zones and things...
Please don't laugh. SQL is not my kind of thing... :)
Create a function that can generate a date table as follows:
(stolen from http://www.codeproject.com/KB/database/GenerateDateTable.aspx)
Create Function dbo.fnDateTable
(
#StartDate datetime,
#EndDate datetime,
#DayPart char(5) -- support 'day','month','year','hour', default 'day'
)
Returns #Result Table
(
[Date] datetime
)
As
Begin
Declare #CurrentDate datetime
Set #CurrentDate=#StartDate
While #CurrentDate<=#EndDate
Begin
Insert Into #Result Values (#CurrentDate)
Select #CurrentDate=
Case
When #DayPart='year' Then DateAdd(yy,1,#CurrentDate)
When #DayPart='month' Then DateAdd(mm,1,#CurrentDate)
When #DayPart='hour' Then DateAdd(hh,1,#CurrentDate)
Else
DateAdd(dd,1,#CurrentDate)
End
End
Return
End
Then, join against that table
SELECT dates.Date as Date, sum(SubTotal+TaxAmount+ShippingAmount)
FROM [fnDateTable] (dateadd("m",-1,CONVERT(VARCHAR(10),GETDATE(),111)),CONVERT(VARCHAR(10),GETDATE(),111),'day') dates
LEFT JOIN Orders
ON dates.Date = DateCreated
GROUP BY dates.Date
declare #oldest_date datetime
declare #daily_sum numeric(18,2)
declare #temp table(
sales_date datetime,
sales_sum numeric(18,2)
)
select #oldest_date = dateadd(day,-30,getdate())
while #oldest_date <= getdate()
begin
set #daily_sum = (select sum(SubTotal) from SalesTable where DateCreated = #oldest_date)
insert into #temp(sales_date, sales_sum) values(#oldest_date, #daily_sum)
set #oldest_date = dateadd(day,1,#oldest_date)
end
select * from #temp
OK - I missed that 'last 30 days' part. The bit above, while not as clean, IMHO, as the date table, should work. Another variant would be to use the while loop to fill a temp table just with the last 30 days and do a left outer join with the result of my original query.
including those days with no sales.
That's the difficult part. I don't think the first answer will help you with that. I did something similar to this with a separate date table.
You can find the directions on how to do so here:
Date Table
I have a Log table table with LogID an index which i never delete any records. it has index from 1 to ~10000000. Using this table I can write
select
s.ddate, SUM(isnull(o.SubTotal,0))
from
(
select
cast(datediff(d,LogID,getdate()) as datetime) AS ddate
from
Log
where
LogID <31
) s right join orders o on o.orderdate = s.ddate
group by s.ddate
I actually did this today. We also got a e-commerce application. I don't want to fill our database with "useless" dates. I just do the group by and create all the days for the last N days in Java, and peer them with the date/sales results from the database.
Where is this ultimately going to end up? I ask only because it may be easier to fill in the empty days with whatever program is going to deal with the data instead of trying to get it done in SQL.
SQL is a wonderful language, and it is capable of a great many things, but sometimes you're just better off working the finer points of the data in the program instead.
(Revised a bit--I hit enter too soon)
I started poking at this, and as it hits some pretty tricky SQL concepts it quickly grew into the following monster. If feasible, you might be better off adapting THEn's solution; or, like many others advise, using application code to fill in the gaps could be preferrable.
-- A temp table holding the 30 dates that you want to check
DECLARE #Foo Table (Date smalldatetime not null)
-- Populate the table using a common "tally table" methodology (I got this from SQL Server magazine long ago)
;WITH
L0 AS (SELECT 1 AS C UNION ALL SELECT 1), --2 rows
L1 AS (SELECT 1 AS C FROM L0 AS A, L0 AS B),--4 rows
L2 AS (SELECT 1 AS C FROM L1 AS A, L1 AS B),--16 rows
L3 AS (SELECT 1 AS C FROM L2 AS A, L2 AS B),--256 rows
Tally AS (SELECT ROW_NUMBER() OVER(ORDER BY C) AS Number FROM L3)
INSERT #Foo (Date)
select dateadd(dd, datediff(dd, 0, dateadd(dd, -number + 1, getdate())), 0)
from Tally
where Number < 31
Step 1 is to build a temp table containint the 30 dates that you are concerned with. That abstract wierdness is about the fastest way known to build a table of consecutive integers; add a few more subqueries, and you can populate millions or more in mere seconds. I take the first 30, and use dateadd and the current date/time to convert them into dates. If you already have a "fixed" table that has 1-30, you can use that and skip the CTE entirely (by replacing table "Tally" with your table).
The outer two date function calls remove the time portion of the generated string.
(Note that I assume that your order date also has no time portion -- otherwise you've got another common problem to resolve.)
For testing purposes I built table #Orders, and this gets you the rest:
SELECT f.Date, sum(ordersubtotal) as SalesSum
from #Foo f
left outer join #Orders o
on o.DateCreated = f.Date
group by f.Date
I created the Function DateTable as JamesMLV pointed out to me.
And then the SQL looks like this:
SELECT dates.date, ISNULL(SUM(ordersubtotal), 0) as Sales FROM [dbo].[DateTable] ('2009-08-01','2009-08-31','day') dates
LEFT JOIN Orders ON CONVERT(VARCHAR(10),Orders.datecreated, 111) = dates.date
group by dates.date
SELECT DateCreated,
SUM(SubTotal) AS SalesSum
FROM Orders
GROUP BY DateCreated