Summing up columns from two different tables - sql

I have two different tables FirewallLog and ProxyLog. There is no relation between these two tables. They have four common fields :
LogTime ClientIP BytesSent BytesRec
I need to Calculate the total usage of a particular ClientIP for each day over a period of time (like last month) and display it like below:
Date TotalUsage
2/12 125
2/13 145
2/14 0
. .
. .
3/11 150
3/12 125
TotalUsage is SUM(FirewallLog.BytesSent + FirewallLog.BytesRec) + SUM(ProxyLog.BytesSent + ProxyLog.BytesRec) for that IP. I have to show Zero if there is no usage (no record) for that day.
I need to find the fastest solution to this problem. Any Ideas?

First, create a Calendar table. One that has, at the very least, an id column and a calendar_date column, and fill it with dates covering every day of every year you can ever be interested in . (You'll find that you'll add flags for weekends, bankholidays and all sorts of other useful meta-data about dates.)
Then you can LEFT JOIN on to that table, after combining your two tables with a UNION.
SELECT
CALENDAR.calendar_date,
JOINT_LOG.ClientIP,
ISNULL(SUM(JOINT_LOG.BytesSent + JOINT_LOG.BytesRec), 0) AS TotalBytes
FROM
CALENDAR
LEFT JOIN
(
SELECT LogTime, ClientIP, BytesSent, BytesRec FROM FirewallLog
UNION ALL
SELECT LogTime, ClientIP, BytesSent, BytesRec FROM ProxyLog
)
AS JOINT_LOG
ON JOINT_LOG.LogTime >= CALENDAR.calendar_date
AND JOINT_LOG.LogTime < CALENDAR.calendar_date+1
WHERE
CALENDAR.calendar_date >= #start_date
AND CALENDAR.calendar_date < #cease_date
GROUP BY
CALENDAR.calendar_date,
JOINT_LOG.ClientIP
SQL Server is very good at optimising this type of UNION ALL query. Assuming that you have appropriate indexes.

If you don't have a calendar table, you can create one using a recursive CTE:
declare #startdate date = '2013-02-01';
declare #enddate date = '2013-03-01';
with dates as (
select #startdate as thedate
union all
select dateadd(day, 1, thedate)
from dates
where thedate < #enddate
)
select driver.thedate, driver.ClientIP,
coalesce(fwl.FWBytes, 0) + coalesce(pl.PLBytes, 0) as TotalBytes
from (select d.thedate, fwl.ClientIP
from dates d cross join
(select distinct ClientIP from FirewallLog) fwl
) driver left outer join
(select cast(fwl.logtime as date) as thedate,
SUM(fwl.BytesSent + fwl.BytesRec) as FWBytes
from FirewallLog fwl
group by cast(fwl.logtime as date)
) fwl
on driver.thedate = fwl.thedate and driver.clientIP = fwl.ClientIP left outer join
(select cast(pl.logtime as date) as thedate,
SUM(pl.BytesSent + pl.BytesRec) as PLBytes
from ProxyLog pl
group by cast(pl.logtime as date)
) pl
on driver.thedate = pl.thedate and driver.ClientIP = pl.ClientIP
This uses a driver table that generates all the combinations of IP and date, which it then uses for joining to the summarized table. This formulation assumes that the "FirewallLog" contains all the "ClientIp"s of interest.
This also breaks out the two values, in case you also want to include them (to see which is contributing more bytes to the total, for instance).

I would recommend creating a Dates Lookup table if that is an option. Create the table once and then you can use it as often as needed. If not, you'll need to look into creating a Recursive CTE to act as the Dates table (easy enough -- look on stackoverflow for examples).
Select d.date,
results.ClientIp
Sum(results.bytes)
From YourDateLookupTable d
Left Join (
Select ClientIp, logtime, BytesSent + BytesRec bytes From FirewallLog
Union All
Select ClientIp, logtime, BytesSent + BytesRec bytes From ProxyLog
) results On d.date = results.logtime
Group By d.date,
results.ClientIp
This assumes the logtime and date data types are the same. If logtime is a date time, you'll need to convert it to a date.

Related

How to add a set of dates for each category in a dimension?

I have data that looks like this where there is a monthly count of a particular animal for each month. By default, it aggregates in the month where there is data.
However, I would like to like to have a default set of dates for each animal up to the current month date with 0 if there's no data. Desired Result -
Is there a way to handle with a on sql server and not in Excel?
Much appreciated in advance.
You can generate the months you want using a numbers table or recursive CTE (or calendar table). Then cross join with the animals to generate the rows and use left join to bring in the existing data:
with dates as (
select min(date) as dte
from t
union all
select dateadd(month, 1 dte)
from dates
where dte < getdate()
)
select a.animal, d.dte, coalesce(t.monthly_count, 0) as monthly_count
from dates d cross join
(select distinct animal from t) a left join
data t
on t.date = d.dte and t.animal = a.animal
order by a.animal, d.dte;

Operand clash with date table and varchar date strings in cross join

This follows on from my previous question, but since I tried to simplify, I appear to have missed something Daily snapshot table using cte loop
I am trying to set up the below cross join between dates and an employee table. I need a daily count according to division and department, but the dates won't link easily since the dates are stored as varchar (not my choice, I can't change it).
I now have a date table that includes a style112 (yyyymmdd) key that I can link to the table, but there seems to be a failure somewhere along the joins.
I'm so close, but really am lost! I have never had to work with string dates and wouldn't wish it upon anyone.
DECLARE #DATESTART AS Date = '20180928';
DECLARE #DATEEND AS Date = '20181031';
WITH Dates AS (
SELECT #DATESTART AS Dte
UNION ALL
SELECT DTE + 1
FROM Dates
WHERE Dte <= #DATEEND )
SELECT
Dt.Dte
,CAST(DTC.Style112 AS VARCHAR)
,Emp.Division_Description
,Emp.Department_Description
,(SELECT
COUNT(*)
FROM ASS_D_EmpMaster_Live E
WHERE
E.[Start_Date] <= CAST(DTC.Style112 AS VARCHAR)
AND (E.Leaving_Date > CAST(DTC.Style112 AS VARCHAR)
OR E.Leaving_Date = '00000000')
) Counts
FROM Dates Dt
LEFT JOIN ASS_C_DateConversions DTC
ON DTC.[Date] = Dt.DtE
CROSS JOIN
(
SELECT DISTINCT
Division_Description
,Department_Description
FROM
ASS_D_EmpMaster_Live e
) Emp
OPTION (MAXRECURSION 1000)
Desired output:
Date
Dept1
Dept2
Dept3
20180901
25
231
154
20180902
23
232
154
I don't think you need the conversion table at all and I would remove it. And I believe the subquery should look like this:
SELECT COUNT(*)
FROM ASS_D_EmpMaster_Live E
WHERE
CAST(E.Start_Date AS DATE) <= Dt.Dte
AND (CAST(E.Leaving_Date AS DATE) > Dt.Dte OR E.Leaving_Date = '00000000')

How do i find available date ranges from date ranges

Sql Server
I already added bookings from my hotel room management system reservation data. I want sql query for retrieve rooms available date ranges and also i want find specific date range is available
You can use something like the following. It's not an easy query, I'll try to explain as simple as possible.
Use a recursive CTE to generate dates from a specified start date to a specified end date.
Join each date to the different room IDs you might have in your table to create all potential available dates.
Determine which dates are unavailable for each room.
Determine which dates are available for each room by joining all potential available dates and removing unavailable ones (point 2 vs 3).
Determine how to group by each range (I used a ROW_NUMBER with a DENSE_RANK).
Display results in intervals, for each room.
Script:
-- Period to consider
DECLARE #StartDate DATE = '2018-06-20'
DECLARE #EndDate DATE = '2018-09-01'
;WITH GeneratedDates AS
(
SELECT
GeneratedDate = #StartDate
UNION ALL
SELECT
GeneratedDate = DATEADD(DAY, 1, G.GeneratedDate)
FROM
GeneratedDates AS G
WHERE
G.GeneratedDate < #EndDate
),
ExistingRooms AS
(
SELECT DISTINCT
RoomId
FROM
HotelReservation.dbo.Reservation AS R
),
UnavailableDatesByRoom AS
(
SELECT DISTINCT
R.RoomID,
UnavailableDate = G.GeneratedDate
FROM
HotelReservation.dbo.Reservation AS R
INNER JOIN GeneratedDates AS G ON G.GeneratedDate BETWEEN R.CheckIn AND R.CheckOut
),
AvailableDaysByRoom AS
(
SELECT
AvailableDate = G.GeneratedDate,
E.RoomID,
DateRanking = ROW_NUMBER() OVER (PARTITION BY E.RoomID ORDER BY G.GeneratedDate ASC)
FROM
GeneratedDates AS G
CROSS JOIN ExistingRooms AS E
WHERE
NOT EXISTS (
SELECT
'unavailable date for that room'
FROM
UnavailableDatesByRoom AS U
WHERE
U.RoomID = E.RoomID AND
G.GeneratedDate = U.UnavailableDate)
),
AvailableDaysByRoomGroupings AS
(
SELECT
A.*,
MagicRanking = DENSE_RANK() OVER (PARTITION BY A.RoomID ORDER BY DateRanking - DATEDIFF(DAY, '2010-01-01', A.AvailableDate))
FROM
AvailableDaysByRoom AS A
)
SELECT
G.RoomID,
FirstAvailableStartDate = MIN(G.AvailableDate),
LastAvailableStartDate = MAX(G.AvailableDate)
FROM
AvailableDaysByRoomGroupings AS G
GROUP BY
G.RoomID,
G.MagicRanking
ORDER BY
G.RoomID,
FirstAvailableStartDate
OPTION
(MAXRECURSION 32000)

adding a row for missing data

Between a date range 2017-02-01 - 2017-02-10, i'm calculating a running balance.
I have days where we have missing data, how would I include these missing dates with the previous days balance ?
Example data:
we are missing data for 2017-02-04,2017-02-05 and 2017-02-06, how would i add a row in the query with the previous balance?
The date range is a parameter, so could change....
Can i use something like the lag function?
I would be inclined to use a recursive CTE and then fill in the values. Here is one approach using outer apply:
with dates as (
select mind as dte, mind, maxd
from (select min(date) as mind, max(date) as maxd from t) t
union all
select dateadd(day, 1, dte), mind, maxd
from dates
where dte < maxd
)
select d.dte, t.balance
from dates d outer apply
(select top 1 t.*
from t
where t.date <= d.dte
order by t.date desc
) t;
You can generate dates using tally table as below:
Declare #d1 date ='2017-02-01'
Declare #d2 date ='2017-02-10'
;with cte_dates as (
Select top (datediff(D, #d1, #d2)+1) Dates = Dateadd(day, Row_Number() over (order by (Select NULL))-1, #d1) from
master..spt_values s1, master..spt_values s2
)
Select * from cte_dates left join ....
And do left join to your table and get running total
Adding to the date range & CTE solutions, I have created Date Dimension tables in numerous databases where I just left join to them.
There are free scripts online to create date dimension tables for SQL Server. I highly recommend them. Plus, it makes aggregation by other time periods much more efficient (e.g. Quarter, Months, Year, etc....)

SQL for counting events by date

I feel like I've seen this question asked before, but neither the SO search nor google is helping me... maybe I just don't know how to phrase the question. I need to count the number of events (in this case, logins) per day over a given time span so that I can make a graph of website usage. The query I have so far is this:
select
count(userid) as numlogins,
count(distinct userid) as numusers,
convert(varchar, entryts, 101) as date
from
usagelog
group by
convert(varchar, entryts, 101)
This does most of what I need (I get a row per date as the output containing the total number of logins and the number of unique users on that date). The problem is that if no one logs in on a given date, there will not be a row in the dataset for that date. I want it to add in rows indicating zero logins for those dates. There are two approaches I can think of for solving this, and neither strikes me as very elegant.
Add a column to the result set that lists the number of days between the start of the period and the date of the current row. When I'm building my chart output, I'll keep track of this value and if the next row is not equal to the current row plus one, insert zeros into the chart for each of the missing days.
Create a "date" table that has all the dates in the period of interest and outer join against it. Sadly, the system I'm working on already has a table for this purpose that contains a row for every date far into the future... I don't like that, and I'd prefer to avoid using it, especially since that table is intended for another module of the system and would thus introduce a dependency on what I'm developing currently.
Any better solutions or hints at better search terms for google? Thanks.
Frankly, I'd do this programmatically when building the final output. You're essentially trying to read something from the database which is not there (data for days that have no data). SQL isn't really meant for that sort of thing.
If you really want to do that, though, a "date" table seems your best option. To make it a bit nicer, you could generate it on the fly, using i.e. your DB's date functions and a derived table.
I had to do exactly the same thing recently. This is how I did it in T-SQL (
YMMV on speed, but I've found it performant enough over a coupla million rows of event data):
DECLARE #DaysTable TABLE ( [Year] INT, [Day] INT )
DECLARE #StartDate DATETIME
SET #StartDate = whatever
WHILE (#StartDate <= GETDATE())
BEGIN
INSERT INTO #DaysTable ( [Year], [Day] )
SELECT DATEPART(YEAR, #StartDate), DATEPART(DAYOFYEAR, #StartDate)
SELECT #StartDate = DATEADD(DAY, 1, #StartDate)
END
-- This gives me a table of all days since whenever
-- you could select #StartDate as the minimum date of your usage log)
SELECT days.Year, days.Day, events.NumEvents
FROM #DaysTable AS days
LEFT JOIN (
SELECT
COUNT(*) AS NumEvents
DATEPART(YEAR, LogDate) AS [Year],
DATEPART(DAYOFYEAR, LogDate) AS [Day]
FROM LogData
GROUP BY
DATEPART(YEAR, LogDate),
DATEPART(DAYOFYEAR, LogDate)
) AS events ON days.Year = events.Year AND days.Day = events.Day
Create a memory table (a table variable) where you insert your date ranges, then outer join the logins table against it. Group by your start date, then you can perform your aggregations and calculations.
The strategy I normally use is to UNION with the opposite of the query, generally a query that retrieves data for rows that don't exist.
If I wanted to get the average mark for a course, but some courses weren't taken by any students, I'd need to UNION with those not taken by anyone to display a row for every class:
SELECT AVG(mark), course FROM `marks`
UNION
SELECT NULL, course FROM courses WHERE course NOT IN
(SELECT course FROM marks)
Your query will be more complex but the same principle should apply. You may indeed need a table of dates for your second query
Option 1
You can create a temp table and insert dates with the range and do a left outer join with the usagelog
Option 2
You can programmetically insert the missing dates while evaluating the result set to produce the final output
WITH q(n) AS
(
SELECT 0
UNION ALL
SELECT n + 1
FROM q
WHERE n < 99
),
qq(n) AS
(
SELECT 0
UNION ALL
SELECT n + 1
FROM q
WHERE n < 99
),
dates AS
(
SELECT q.n * 100 + qq.n AS ndate
FROM q, qq
)
SELECT COUNT(userid) as numlogins,
COUNT(DISTINCT userid) as numusers,
CAST('2000-01-01' + ndate AS DATETIME) as date
FROM dates
LEFT JOIN
usagelog
ON entryts >= CAST('2000-01-01' AS DATETIME) + ndate
AND entryts < CAST('2000-01-01' AS DATETIME) + ndate + 1
GROUP BY
ndate
This will select up to 10,000 dates constructed on the fly, that should be enough for 30 years.
SQL Server has a limitation of 100 recursions per CTE, that's why the inner queries can return up to 100 rows each.
If you need more than 10,000, just add a third CTE qqq(n) and cross-join with it in dates.