Create data for all dates in a date range - sql

Here is the input dataset I have with revenue for some of the days:
Would need an output with all dates between 03/01/2021 to 03/15/2021. $0 revenue where the value is not present

You could use two additional helper tables:
a Dates table that holds all the dates in the desired range, and
a Zipcodes table that holds the distinct zip codes.
Instead of physical tables, temporary tables and/or table variables, you could also consider using table expressions (subqueries in the FROM-clause) or common table expressions (in a WITH-clause). Common table expressions can often also be recursive, which might be a nice solution for creating a value range with specific start and end values (like your dates range here).
It is already pointed out, that concrete solution proposals heavily depend on the target DBMS. Sadly, it is unspecified here (at the time of writing this answer). Below is a sample implementation for Microsoft SQL Server, using T as the placeholder for your actual table name. It uses two common table expressions: a recursive CTE for the Dates table and a normal CTE for the Zipcodes table.
WITH
[Dates] AS (
SELECT CAST('2020-03-01' AS DATE) AS [Date]
UNION ALL
SELECT DATEADD(DAY, 1, [Date])
FROM [Dates]
WHERE [Date] < '2020-03-15'
),
[ZipCodes] AS (
SELECT DISTINCT [Zip] FROM T
)
SELECT D.[Date], Z.[Zip], COALESCE(T.[Revenue], 0) AS [Revenue]
FROM
[Dates] AS D
CROSS JOIN [ZipCodes] AS Z
LEFT JOIN T ON T.[Date] = D.[Date] AND T.[Zip] = Z.[Zip]
ORDER BY Z.[Zip], D.[Date]

You haven't specified your RDBMS, it's probably going to be implemented differently in different systems. E.g. in the SQL Server there's a master.dbo.spt_values table, which can potentially be used for that:
SELECT DATEADD(day, d.number, '2021-03-01') as Date,
zips.Zip,
IsNull(t.Revenue, 0) as Revenue
FROM master.dbo.spt_values d
CROSS JOIN (SELECT DISTINCT(Zip) FROM Table) zips
LEFT JOIN Table t
ON DATEADD(day, d.number, '2021-03-01') = t.Date
AND zips.Zip = t.Zip
WHERE d.type = 'P'
AND d.number < 15
ORDER BY zips.Zip, d.number
It's somewhat of a hack though

Related

How to add a set of dates for each category in a dimension?

I have data that looks like this where there is a monthly count of a particular animal for each month. By default, it aggregates in the month where there is data.
However, I would like to like to have a default set of dates for each animal up to the current month date with 0 if there's no data. Desired Result -
Is there a way to handle with a on sql server and not in Excel?
Much appreciated in advance.
You can generate the months you want using a numbers table or recursive CTE (or calendar table). Then cross join with the animals to generate the rows and use left join to bring in the existing data:
with dates as (
select min(date) as dte
from t
union all
select dateadd(month, 1 dte)
from dates
where dte < getdate()
)
select a.animal, d.dte, coalesce(t.monthly_count, 0) as monthly_count
from dates d cross join
(select distinct animal from t) a left join
data t
on t.date = d.dte and t.animal = a.animal
order by a.animal, d.dte;

How to synthesize too many SQL calls into one

A vb.net desktop application must make hundreds of these calls on a SQL Server (2012). The declaration is composed in the application and requires hundreds (or sometimes thousands) of times from the database.
I was looking for a way to synthesize the call into a single SQL statement or stored procedure if possible.
The difference between one statement and another is the "day" variable (in the example code is '20190724') , which is incremented by one day for each call.
SELECT SUM(table1.qta)
FROM table1
WHERE (table1.id = 35)
AND ('20190724' BETWEEN table1.date1 AND table1.date2)
The expected result would be 2 columns:
day1 SumQta1
day2 SumQta2
day3 SumQta3
....
What you should do is to change your approach.
Instead of querying the sum for each day individually, create a user defined table type to hold the dates, and use it to pass a table valued parameter to the database.
Then query the table joined to the variable using sum and group by the date.
Your SQL should look something like this:
CREATE TYPE dbo.DatesList
(
Date Date NOT NULL PRIMARY KEY
);
CREATE PROCEDURE GetSums
(
#Dates dbo.DatesList readonly
)
AS
SELECT Dates.Date, sum(table1.qta)
FROM table1
JOIN #Dates As Dates
ON Dates.Date BETWEEN table1.date1 AND table1.date2
WHERE table1.id = 35
GROUP BY Dates.Date
As for how to use a table valued parameter from Vb.Net - Well, that depends on how you connect to the database in the first place. I'm sure there are plenty of resources here on stackoverflow and also out on the web to help with that part.
Well, you could build a VALUES clause in your program with the days of interest. Then left join the table and aggregate to get the sum. Back in your application you can then iterate the result day for day.
SELECT d.day,
coalesce(sum(t.qta)) qta
FROM (VALUES ('20190101'),
('20190102'),
...) d (day)
LEFT JOIN table1 t
ON t.id = 35
AND d.day BETWEEN t.date1
AND t.date2
GROUP BY d.day
ORDER BY d.day;
Another option would be the use of a recursive CTE to generate the days from a minimum and maximum and then do the join on that, again aggregating to get the sums.
WITH
d (day)
AS
(
SELECT convert(date, '20190101') day
UNION ALL
SELECT dateadd(day, 1, day)
FROM d
WHERE dateadd(day, 1, day) <= convert(date, '20190131')
)
SELECT d.day,
coalesce(sum(t.qta)) qta
FROM d
LEFT JOIN table1 t
ON t.id = 35
AND d.day BETWEEN t.date1
AND t.date2
GROUP BY d.day
ORDER BY d.day;

Summing up columns from two different tables

I have two different tables FirewallLog and ProxyLog. There is no relation between these two tables. They have four common fields :
LogTime ClientIP BytesSent BytesRec
I need to Calculate the total usage of a particular ClientIP for each day over a period of time (like last month) and display it like below:
Date TotalUsage
2/12 125
2/13 145
2/14 0
. .
. .
3/11 150
3/12 125
TotalUsage is SUM(FirewallLog.BytesSent + FirewallLog.BytesRec) + SUM(ProxyLog.BytesSent + ProxyLog.BytesRec) for that IP. I have to show Zero if there is no usage (no record) for that day.
I need to find the fastest solution to this problem. Any Ideas?
First, create a Calendar table. One that has, at the very least, an id column and a calendar_date column, and fill it with dates covering every day of every year you can ever be interested in . (You'll find that you'll add flags for weekends, bankholidays and all sorts of other useful meta-data about dates.)
Then you can LEFT JOIN on to that table, after combining your two tables with a UNION.
SELECT
CALENDAR.calendar_date,
JOINT_LOG.ClientIP,
ISNULL(SUM(JOINT_LOG.BytesSent + JOINT_LOG.BytesRec), 0) AS TotalBytes
FROM
CALENDAR
LEFT JOIN
(
SELECT LogTime, ClientIP, BytesSent, BytesRec FROM FirewallLog
UNION ALL
SELECT LogTime, ClientIP, BytesSent, BytesRec FROM ProxyLog
)
AS JOINT_LOG
ON JOINT_LOG.LogTime >= CALENDAR.calendar_date
AND JOINT_LOG.LogTime < CALENDAR.calendar_date+1
WHERE
CALENDAR.calendar_date >= #start_date
AND CALENDAR.calendar_date < #cease_date
GROUP BY
CALENDAR.calendar_date,
JOINT_LOG.ClientIP
SQL Server is very good at optimising this type of UNION ALL query. Assuming that you have appropriate indexes.
If you don't have a calendar table, you can create one using a recursive CTE:
declare #startdate date = '2013-02-01';
declare #enddate date = '2013-03-01';
with dates as (
select #startdate as thedate
union all
select dateadd(day, 1, thedate)
from dates
where thedate < #enddate
)
select driver.thedate, driver.ClientIP,
coalesce(fwl.FWBytes, 0) + coalesce(pl.PLBytes, 0) as TotalBytes
from (select d.thedate, fwl.ClientIP
from dates d cross join
(select distinct ClientIP from FirewallLog) fwl
) driver left outer join
(select cast(fwl.logtime as date) as thedate,
SUM(fwl.BytesSent + fwl.BytesRec) as FWBytes
from FirewallLog fwl
group by cast(fwl.logtime as date)
) fwl
on driver.thedate = fwl.thedate and driver.clientIP = fwl.ClientIP left outer join
(select cast(pl.logtime as date) as thedate,
SUM(pl.BytesSent + pl.BytesRec) as PLBytes
from ProxyLog pl
group by cast(pl.logtime as date)
) pl
on driver.thedate = pl.thedate and driver.ClientIP = pl.ClientIP
This uses a driver table that generates all the combinations of IP and date, which it then uses for joining to the summarized table. This formulation assumes that the "FirewallLog" contains all the "ClientIp"s of interest.
This also breaks out the two values, in case you also want to include them (to see which is contributing more bytes to the total, for instance).
I would recommend creating a Dates Lookup table if that is an option. Create the table once and then you can use it as often as needed. If not, you'll need to look into creating a Recursive CTE to act as the Dates table (easy enough -- look on stackoverflow for examples).
Select d.date,
results.ClientIp
Sum(results.bytes)
From YourDateLookupTable d
Left Join (
Select ClientIp, logtime, BytesSent + BytesRec bytes From FirewallLog
Union All
Select ClientIp, logtime, BytesSent + BytesRec bytes From ProxyLog
) results On d.date = results.logtime
Group By d.date,
results.ClientIp
This assumes the logtime and date data types are the same. If logtime is a date time, you'll need to convert it to a date.

Select data from SQL DB per day

I have a table with order information in an E-commerce store. Schema looks like this:
[Orders]
Id|SubTotal|TaxAmount|ShippingAmount|DateCreated
This table does only contain data for every Order. So if a day goes by without any orders, no sales data is there for that day.
I would like to select subtotal-per-day for the last 30 days, including those days with no sales.
The resultset would look like this:
Date | SalesSum
2009-08-01 | 15235
2009-08-02 | 0
2009-08-03 | 340
2009-08-04 | 0
...
Doing this, only gives me data for those days with orders:
select DateCreated as Date, sum(ordersubtotal) as SalesSum
from Orders
group by DateCreated
You could create a table called Dates, and select from that table and join the Orders table. But I really want to avoid that, because it doesn't work good enough when dealing with different time zones and things...
Please don't laugh. SQL is not my kind of thing... :)
Create a function that can generate a date table as follows:
(stolen from http://www.codeproject.com/KB/database/GenerateDateTable.aspx)
Create Function dbo.fnDateTable
(
#StartDate datetime,
#EndDate datetime,
#DayPart char(5) -- support 'day','month','year','hour', default 'day'
)
Returns #Result Table
(
[Date] datetime
)
As
Begin
Declare #CurrentDate datetime
Set #CurrentDate=#StartDate
While #CurrentDate<=#EndDate
Begin
Insert Into #Result Values (#CurrentDate)
Select #CurrentDate=
Case
When #DayPart='year' Then DateAdd(yy,1,#CurrentDate)
When #DayPart='month' Then DateAdd(mm,1,#CurrentDate)
When #DayPart='hour' Then DateAdd(hh,1,#CurrentDate)
Else
DateAdd(dd,1,#CurrentDate)
End
End
Return
End
Then, join against that table
SELECT dates.Date as Date, sum(SubTotal+TaxAmount+ShippingAmount)
FROM [fnDateTable] (dateadd("m",-1,CONVERT(VARCHAR(10),GETDATE(),111)),CONVERT(VARCHAR(10),GETDATE(),111),'day') dates
LEFT JOIN Orders
ON dates.Date = DateCreated
GROUP BY dates.Date
declare #oldest_date datetime
declare #daily_sum numeric(18,2)
declare #temp table(
sales_date datetime,
sales_sum numeric(18,2)
)
select #oldest_date = dateadd(day,-30,getdate())
while #oldest_date <= getdate()
begin
set #daily_sum = (select sum(SubTotal) from SalesTable where DateCreated = #oldest_date)
insert into #temp(sales_date, sales_sum) values(#oldest_date, #daily_sum)
set #oldest_date = dateadd(day,1,#oldest_date)
end
select * from #temp
OK - I missed that 'last 30 days' part. The bit above, while not as clean, IMHO, as the date table, should work. Another variant would be to use the while loop to fill a temp table just with the last 30 days and do a left outer join with the result of my original query.
including those days with no sales.
That's the difficult part. I don't think the first answer will help you with that. I did something similar to this with a separate date table.
You can find the directions on how to do so here:
Date Table
I have a Log table table with LogID an index which i never delete any records. it has index from 1 to ~10000000. Using this table I can write
select
s.ddate, SUM(isnull(o.SubTotal,0))
from
(
select
cast(datediff(d,LogID,getdate()) as datetime) AS ddate
from
Log
where
LogID <31
) s right join orders o on o.orderdate = s.ddate
group by s.ddate
I actually did this today. We also got a e-commerce application. I don't want to fill our database with "useless" dates. I just do the group by and create all the days for the last N days in Java, and peer them with the date/sales results from the database.
Where is this ultimately going to end up? I ask only because it may be easier to fill in the empty days with whatever program is going to deal with the data instead of trying to get it done in SQL.
SQL is a wonderful language, and it is capable of a great many things, but sometimes you're just better off working the finer points of the data in the program instead.
(Revised a bit--I hit enter too soon)
I started poking at this, and as it hits some pretty tricky SQL concepts it quickly grew into the following monster. If feasible, you might be better off adapting THEn's solution; or, like many others advise, using application code to fill in the gaps could be preferrable.
-- A temp table holding the 30 dates that you want to check
DECLARE #Foo Table (Date smalldatetime not null)
-- Populate the table using a common "tally table" methodology (I got this from SQL Server magazine long ago)
;WITH
L0 AS (SELECT 1 AS C UNION ALL SELECT 1), --2 rows
L1 AS (SELECT 1 AS C FROM L0 AS A, L0 AS B),--4 rows
L2 AS (SELECT 1 AS C FROM L1 AS A, L1 AS B),--16 rows
L3 AS (SELECT 1 AS C FROM L2 AS A, L2 AS B),--256 rows
Tally AS (SELECT ROW_NUMBER() OVER(ORDER BY C) AS Number FROM L3)
INSERT #Foo (Date)
select dateadd(dd, datediff(dd, 0, dateadd(dd, -number + 1, getdate())), 0)
from Tally
where Number < 31
Step 1 is to build a temp table containint the 30 dates that you are concerned with. That abstract wierdness is about the fastest way known to build a table of consecutive integers; add a few more subqueries, and you can populate millions or more in mere seconds. I take the first 30, and use dateadd and the current date/time to convert them into dates. If you already have a "fixed" table that has 1-30, you can use that and skip the CTE entirely (by replacing table "Tally" with your table).
The outer two date function calls remove the time portion of the generated string.
(Note that I assume that your order date also has no time portion -- otherwise you've got another common problem to resolve.)
For testing purposes I built table #Orders, and this gets you the rest:
SELECT f.Date, sum(ordersubtotal) as SalesSum
from #Foo f
left outer join #Orders o
on o.DateCreated = f.Date
group by f.Date
I created the Function DateTable as JamesMLV pointed out to me.
And then the SQL looks like this:
SELECT dates.date, ISNULL(SUM(ordersubtotal), 0) as Sales FROM [dbo].[DateTable] ('2009-08-01','2009-08-31','day') dates
LEFT JOIN Orders ON CONVERT(VARCHAR(10),Orders.datecreated, 111) = dates.date
group by dates.date
SELECT DateCreated,
SUM(SubTotal) AS SalesSum
FROM Orders
GROUP BY DateCreated

SQL for counting events by date

I feel like I've seen this question asked before, but neither the SO search nor google is helping me... maybe I just don't know how to phrase the question. I need to count the number of events (in this case, logins) per day over a given time span so that I can make a graph of website usage. The query I have so far is this:
select
count(userid) as numlogins,
count(distinct userid) as numusers,
convert(varchar, entryts, 101) as date
from
usagelog
group by
convert(varchar, entryts, 101)
This does most of what I need (I get a row per date as the output containing the total number of logins and the number of unique users on that date). The problem is that if no one logs in on a given date, there will not be a row in the dataset for that date. I want it to add in rows indicating zero logins for those dates. There are two approaches I can think of for solving this, and neither strikes me as very elegant.
Add a column to the result set that lists the number of days between the start of the period and the date of the current row. When I'm building my chart output, I'll keep track of this value and if the next row is not equal to the current row plus one, insert zeros into the chart for each of the missing days.
Create a "date" table that has all the dates in the period of interest and outer join against it. Sadly, the system I'm working on already has a table for this purpose that contains a row for every date far into the future... I don't like that, and I'd prefer to avoid using it, especially since that table is intended for another module of the system and would thus introduce a dependency on what I'm developing currently.
Any better solutions or hints at better search terms for google? Thanks.
Frankly, I'd do this programmatically when building the final output. You're essentially trying to read something from the database which is not there (data for days that have no data). SQL isn't really meant for that sort of thing.
If you really want to do that, though, a "date" table seems your best option. To make it a bit nicer, you could generate it on the fly, using i.e. your DB's date functions and a derived table.
I had to do exactly the same thing recently. This is how I did it in T-SQL (
YMMV on speed, but I've found it performant enough over a coupla million rows of event data):
DECLARE #DaysTable TABLE ( [Year] INT, [Day] INT )
DECLARE #StartDate DATETIME
SET #StartDate = whatever
WHILE (#StartDate <= GETDATE())
BEGIN
INSERT INTO #DaysTable ( [Year], [Day] )
SELECT DATEPART(YEAR, #StartDate), DATEPART(DAYOFYEAR, #StartDate)
SELECT #StartDate = DATEADD(DAY, 1, #StartDate)
END
-- This gives me a table of all days since whenever
-- you could select #StartDate as the minimum date of your usage log)
SELECT days.Year, days.Day, events.NumEvents
FROM #DaysTable AS days
LEFT JOIN (
SELECT
COUNT(*) AS NumEvents
DATEPART(YEAR, LogDate) AS [Year],
DATEPART(DAYOFYEAR, LogDate) AS [Day]
FROM LogData
GROUP BY
DATEPART(YEAR, LogDate),
DATEPART(DAYOFYEAR, LogDate)
) AS events ON days.Year = events.Year AND days.Day = events.Day
Create a memory table (a table variable) where you insert your date ranges, then outer join the logins table against it. Group by your start date, then you can perform your aggregations and calculations.
The strategy I normally use is to UNION with the opposite of the query, generally a query that retrieves data for rows that don't exist.
If I wanted to get the average mark for a course, but some courses weren't taken by any students, I'd need to UNION with those not taken by anyone to display a row for every class:
SELECT AVG(mark), course FROM `marks`
UNION
SELECT NULL, course FROM courses WHERE course NOT IN
(SELECT course FROM marks)
Your query will be more complex but the same principle should apply. You may indeed need a table of dates for your second query
Option 1
You can create a temp table and insert dates with the range and do a left outer join with the usagelog
Option 2
You can programmetically insert the missing dates while evaluating the result set to produce the final output
WITH q(n) AS
(
SELECT 0
UNION ALL
SELECT n + 1
FROM q
WHERE n < 99
),
qq(n) AS
(
SELECT 0
UNION ALL
SELECT n + 1
FROM q
WHERE n < 99
),
dates AS
(
SELECT q.n * 100 + qq.n AS ndate
FROM q, qq
)
SELECT COUNT(userid) as numlogins,
COUNT(DISTINCT userid) as numusers,
CAST('2000-01-01' + ndate AS DATETIME) as date
FROM dates
LEFT JOIN
usagelog
ON entryts >= CAST('2000-01-01' AS DATETIME) + ndate
AND entryts < CAST('2000-01-01' AS DATETIME) + ndate + 1
GROUP BY
ndate
This will select up to 10,000 dates constructed on the fly, that should be enough for 30 years.
SQL Server has a limitation of 100 recursions per CTE, that's why the inner queries can return up to 100 rows each.
If you need more than 10,000, just add a third CTE qqq(n) and cross-join with it in dates.