Sum netchanges between shapshots in transactional fact - sql

I have a transactional fact with irregular effective from and effectiveTo dates with cumulative total and net changes over ID and time.
Now I want to sum the net changes between two snapshots, for example 2012-01-01 and 2015-07-01 that should include all the rows including and between the snapshot dates.
I would like to select the relevant rows using the snapshot dates so that I can perform a sum on NC_Total grouped on ID.
What is the most efficient way to to this? Can I create a table valued function for this?
This is how the table looks selected for ID IN (1,2):
This is the resultset I should get back using the snapshot dates:
Select rows on snapshot date
And this is how I would like to finally sum it:
Sum of snapshot dates

I think you just want inequalities in a where clause:
select t.*
from t
where effectiveTo >= '2012-01-01' and
startFrom <= '2015-06-01'
Then you can use aggregation:
select id, sum(nc_total) as sum_net_changes
from t
where effectiveTo >= '2012-01-01' and
startFrom <= '2015-06-01'

So I ended up writing a table valued function for this but the gist of the selection had less to do with SQL than with thinking about the datelogic. The solution was not as elegant as I would have wanted:
DECLARE #t1 DATETIME = '2012-01-01'
DECLARE #t2 DATETIME = '2015-07-01'
AND (EffectiveFrom >= #t1 AND EffectiveFrom <= #t2) -- All transactions between the snapshots

Related

PL-SQL query to calculate customers per period from start and stop dates

I have a PL-SQL table with a structure as shown in the example below:
I have customers (customer_number) with insurance cover start and stop dates (cover_start_date and cover_stop_date). I also have dates of accidents for those customers (accident_date). These customers may have more than one row in the table if they have had more than one accident. They may also have no accidents. And they may also have a blank entry for the cover stop date if their cover is ongoing. Sorry I did not design the data format, but I am stuck with it.
I am looking to calculate the number of accidents (num_accidents) and number of customers (num_customers) in a given time period (period_start), and from that the number of accidents-per-customer (which will be easy once I've got those two pieces of information).
Any ideas on how to design a PL-SQL function to do this in a simple way? Ideally with the time periods not being fixed to monthly (for example, weekly or fortnightly too)? Ideally I will end up with a table like this shown below:
Many thanks for any pointers...
You seem to need a list of dates. You can generate one in the query and then use correlated subqueries to calculate the columns you want:
select d.*,
(select count(distinct customer_id)
from t
where t.cover_start_date <= d.dte and
(t.cover_end_date > d.date + interval '1' month or t.cover_end_date is null)
) as num_customers,
(select count(*)
from t
where t.accident_date >= d.dte and
t.accident_date < d.date + interval '1' month
) as accidents,
(select count(distinct customer_id)
from t
where t.accident_date >= d.dte and
t.accident_date < d.date + interval '1' month
) as num_customers_with_accident
from (select date '2020-01-01' as dte from dual union all
select date '2020-02-01' as dte from dual union all
. . .
) d;
If you want to do arithmetic on the columns, you can use this as a subquery or CTE.

SQL query that can create a row for skipped months

I have a table that I join to a calendar table, but I need to populate / create new row for each month between. I.e.
Date GIS CODE Running Total Open
2007-04-30 BEJOORDING, 6566, WESTERN AUSTRALIA 5
2007-09-30 BEJOORDING, 6566, WESTERN AUSTRALIA 6
I need some sort of query that can create end of month date rows between 2007-04-30 and 2007-09-30.
I will then need to fill down the blanks with the most recent fields so I will have a timeline for all end of month values.
I am assuming I will have to use some sort of CTE table but I am not the best at this / understand exactly how they work.
Any help will be greatly appreciated.
This CTE query will give you a table with all end-of-month values between the first and last ones in your table (I've assumed called log). You can then LEFT JOIN that to the table to create rows for all months in the timespan.
WITH CTE AS (
SELECT MIN(Date) AS [Date], MAX(Date) AS Max_Date FROM log
UNION ALL
SELECT EOMONTH(DATEADD(MONTH, 1, [Date])), Max_Date
FROM CTE
WHERE Date < Max_Date
)
SELECT Date
FROM CTE
Demo on SQLFiddle
Can be achieved with a RIGHT JOIN to a subquery against the calendar table using the EOMONTH() function

What is the fastest way to group a DateTime column by Date in T-SQL

I have an older sql 2005 box, and I need to do some summaries of a table with ~500m rows.
I have a datetime column in the table and I want to get just the date out of it for output and group by. I know there are a few ways to do this, but what is the absolute fastest?
Thanks
I suspect the fastest would be to:
SELECT
the_day = DATEADD(DAY, the_day, '19000101'),
the_count
FROM
(
SELECT
the_day = DATEDIFF(DAY, '19000101', [the_datetime_column]),
the_count = COUNT(*)
FROM dbo.the_table
GROUP BY DATEDIFF(DAY, '19000101', [the_datetime_column])
WHERE ...
) AS x;
But "fastest" is relative here, and it will depend largely on the indexes on the table, how you're filtering out rows, etc. You will want to test this against other typical date truncation methods, such as CONVERT(CHAR(8), [the_datetime_column], 112).
What you could consider - depending on whether this query is more important than write performance - is adding a persisted computed column with an index, or an indexed view, that would help this aggregation for you at write time instead of query time.
I imagine you can get a slightly better performance this way.
SELECT cast(cast([actiontime]+.5 as int) as datetime) as [yourdate], count(*) as count
FROM <yourtable>
GROUP BY cast([<yourdate>]+.5 as int)
You can improve this once you upgrade to mssql server 2008.
SELECT cast([<yourdate>] as date) as [yourdate], count(*) as count
FROM <yourtable>
GROUP BY cast([<yourdate>] as date)

Select data from SQL DB per day

I have a table with order information in an E-commerce store. Schema looks like this:
[Orders]
Id|SubTotal|TaxAmount|ShippingAmount|DateCreated
This table does only contain data for every Order. So if a day goes by without any orders, no sales data is there for that day.
I would like to select subtotal-per-day for the last 30 days, including those days with no sales.
The resultset would look like this:
Date | SalesSum
2009-08-01 | 15235
2009-08-02 | 0
2009-08-03 | 340
2009-08-04 | 0
...
Doing this, only gives me data for those days with orders:
select DateCreated as Date, sum(ordersubtotal) as SalesSum
from Orders
group by DateCreated
You could create a table called Dates, and select from that table and join the Orders table. But I really want to avoid that, because it doesn't work good enough when dealing with different time zones and things...
Please don't laugh. SQL is not my kind of thing... :)
Create a function that can generate a date table as follows:
(stolen from http://www.codeproject.com/KB/database/GenerateDateTable.aspx)
Create Function dbo.fnDateTable
(
#StartDate datetime,
#EndDate datetime,
#DayPart char(5) -- support 'day','month','year','hour', default 'day'
)
Returns #Result Table
(
[Date] datetime
)
As
Begin
Declare #CurrentDate datetime
Set #CurrentDate=#StartDate
While #CurrentDate<=#EndDate
Begin
Insert Into #Result Values (#CurrentDate)
Select #CurrentDate=
Case
When #DayPart='year' Then DateAdd(yy,1,#CurrentDate)
When #DayPart='month' Then DateAdd(mm,1,#CurrentDate)
When #DayPart='hour' Then DateAdd(hh,1,#CurrentDate)
Else
DateAdd(dd,1,#CurrentDate)
End
End
Return
End
Then, join against that table
SELECT dates.Date as Date, sum(SubTotal+TaxAmount+ShippingAmount)
FROM [fnDateTable] (dateadd("m",-1,CONVERT(VARCHAR(10),GETDATE(),111)),CONVERT(VARCHAR(10),GETDATE(),111),'day') dates
LEFT JOIN Orders
ON dates.Date = DateCreated
GROUP BY dates.Date
declare #oldest_date datetime
declare #daily_sum numeric(18,2)
declare #temp table(
sales_date datetime,
sales_sum numeric(18,2)
)
select #oldest_date = dateadd(day,-30,getdate())
while #oldest_date <= getdate()
begin
set #daily_sum = (select sum(SubTotal) from SalesTable where DateCreated = #oldest_date)
insert into #temp(sales_date, sales_sum) values(#oldest_date, #daily_sum)
set #oldest_date = dateadd(day,1,#oldest_date)
end
select * from #temp
OK - I missed that 'last 30 days' part. The bit above, while not as clean, IMHO, as the date table, should work. Another variant would be to use the while loop to fill a temp table just with the last 30 days and do a left outer join with the result of my original query.
including those days with no sales.
That's the difficult part. I don't think the first answer will help you with that. I did something similar to this with a separate date table.
You can find the directions on how to do so here:
Date Table
I have a Log table table with LogID an index which i never delete any records. it has index from 1 to ~10000000. Using this table I can write
select
s.ddate, SUM(isnull(o.SubTotal,0))
from
(
select
cast(datediff(d,LogID,getdate()) as datetime) AS ddate
from
Log
where
LogID <31
) s right join orders o on o.orderdate = s.ddate
group by s.ddate
I actually did this today. We also got a e-commerce application. I don't want to fill our database with "useless" dates. I just do the group by and create all the days for the last N days in Java, and peer them with the date/sales results from the database.
Where is this ultimately going to end up? I ask only because it may be easier to fill in the empty days with whatever program is going to deal with the data instead of trying to get it done in SQL.
SQL is a wonderful language, and it is capable of a great many things, but sometimes you're just better off working the finer points of the data in the program instead.
(Revised a bit--I hit enter too soon)
I started poking at this, and as it hits some pretty tricky SQL concepts it quickly grew into the following monster. If feasible, you might be better off adapting THEn's solution; or, like many others advise, using application code to fill in the gaps could be preferrable.
-- A temp table holding the 30 dates that you want to check
DECLARE #Foo Table (Date smalldatetime not null)
-- Populate the table using a common "tally table" methodology (I got this from SQL Server magazine long ago)
;WITH
L0 AS (SELECT 1 AS C UNION ALL SELECT 1), --2 rows
L1 AS (SELECT 1 AS C FROM L0 AS A, L0 AS B),--4 rows
L2 AS (SELECT 1 AS C FROM L1 AS A, L1 AS B),--16 rows
L3 AS (SELECT 1 AS C FROM L2 AS A, L2 AS B),--256 rows
Tally AS (SELECT ROW_NUMBER() OVER(ORDER BY C) AS Number FROM L3)
INSERT #Foo (Date)
select dateadd(dd, datediff(dd, 0, dateadd(dd, -number + 1, getdate())), 0)
from Tally
where Number < 31
Step 1 is to build a temp table containint the 30 dates that you are concerned with. That abstract wierdness is about the fastest way known to build a table of consecutive integers; add a few more subqueries, and you can populate millions or more in mere seconds. I take the first 30, and use dateadd and the current date/time to convert them into dates. If you already have a "fixed" table that has 1-30, you can use that and skip the CTE entirely (by replacing table "Tally" with your table).
The outer two date function calls remove the time portion of the generated string.
(Note that I assume that your order date also has no time portion -- otherwise you've got another common problem to resolve.)
For testing purposes I built table #Orders, and this gets you the rest:
SELECT f.Date, sum(ordersubtotal) as SalesSum
from #Foo f
left outer join #Orders o
on o.DateCreated = f.Date
group by f.Date
I created the Function DateTable as JamesMLV pointed out to me.
And then the SQL looks like this:
SELECT dates.date, ISNULL(SUM(ordersubtotal), 0) as Sales FROM [dbo].[DateTable] ('2009-08-01','2009-08-31','day') dates
LEFT JOIN Orders ON CONVERT(VARCHAR(10),Orders.datecreated, 111) = dates.date
group by dates.date
SELECT DateCreated,
SUM(SubTotal) AS SalesSum
FROM Orders
GROUP BY DateCreated

SQL for counting events by date

I feel like I've seen this question asked before, but neither the SO search nor google is helping me... maybe I just don't know how to phrase the question. I need to count the number of events (in this case, logins) per day over a given time span so that I can make a graph of website usage. The query I have so far is this:
select
count(userid) as numlogins,
count(distinct userid) as numusers,
convert(varchar, entryts, 101) as date
from
usagelog
group by
convert(varchar, entryts, 101)
This does most of what I need (I get a row per date as the output containing the total number of logins and the number of unique users on that date). The problem is that if no one logs in on a given date, there will not be a row in the dataset for that date. I want it to add in rows indicating zero logins for those dates. There are two approaches I can think of for solving this, and neither strikes me as very elegant.
Add a column to the result set that lists the number of days between the start of the period and the date of the current row. When I'm building my chart output, I'll keep track of this value and if the next row is not equal to the current row plus one, insert zeros into the chart for each of the missing days.
Create a "date" table that has all the dates in the period of interest and outer join against it. Sadly, the system I'm working on already has a table for this purpose that contains a row for every date far into the future... I don't like that, and I'd prefer to avoid using it, especially since that table is intended for another module of the system and would thus introduce a dependency on what I'm developing currently.
Any better solutions or hints at better search terms for google? Thanks.
Frankly, I'd do this programmatically when building the final output. You're essentially trying to read something from the database which is not there (data for days that have no data). SQL isn't really meant for that sort of thing.
If you really want to do that, though, a "date" table seems your best option. To make it a bit nicer, you could generate it on the fly, using i.e. your DB's date functions and a derived table.
I had to do exactly the same thing recently. This is how I did it in T-SQL (
YMMV on speed, but I've found it performant enough over a coupla million rows of event data):
DECLARE #DaysTable TABLE ( [Year] INT, [Day] INT )
DECLARE #StartDate DATETIME
SET #StartDate = whatever
WHILE (#StartDate <= GETDATE())
BEGIN
INSERT INTO #DaysTable ( [Year], [Day] )
SELECT DATEPART(YEAR, #StartDate), DATEPART(DAYOFYEAR, #StartDate)
SELECT #StartDate = DATEADD(DAY, 1, #StartDate)
END
-- This gives me a table of all days since whenever
-- you could select #StartDate as the minimum date of your usage log)
SELECT days.Year, days.Day, events.NumEvents
FROM #DaysTable AS days
LEFT JOIN (
SELECT
COUNT(*) AS NumEvents
DATEPART(YEAR, LogDate) AS [Year],
DATEPART(DAYOFYEAR, LogDate) AS [Day]
FROM LogData
GROUP BY
DATEPART(YEAR, LogDate),
DATEPART(DAYOFYEAR, LogDate)
) AS events ON days.Year = events.Year AND days.Day = events.Day
Create a memory table (a table variable) where you insert your date ranges, then outer join the logins table against it. Group by your start date, then you can perform your aggregations and calculations.
The strategy I normally use is to UNION with the opposite of the query, generally a query that retrieves data for rows that don't exist.
If I wanted to get the average mark for a course, but some courses weren't taken by any students, I'd need to UNION with those not taken by anyone to display a row for every class:
SELECT AVG(mark), course FROM `marks`
UNION
SELECT NULL, course FROM courses WHERE course NOT IN
(SELECT course FROM marks)
Your query will be more complex but the same principle should apply. You may indeed need a table of dates for your second query
Option 1
You can create a temp table and insert dates with the range and do a left outer join with the usagelog
Option 2
You can programmetically insert the missing dates while evaluating the result set to produce the final output
WITH q(n) AS
(
SELECT 0
UNION ALL
SELECT n + 1
FROM q
WHERE n < 99
),
qq(n) AS
(
SELECT 0
UNION ALL
SELECT n + 1
FROM q
WHERE n < 99
),
dates AS
(
SELECT q.n * 100 + qq.n AS ndate
FROM q, qq
)
SELECT COUNT(userid) as numlogins,
COUNT(DISTINCT userid) as numusers,
CAST('2000-01-01' + ndate AS DATETIME) as date
FROM dates
LEFT JOIN
usagelog
ON entryts >= CAST('2000-01-01' AS DATETIME) + ndate
AND entryts < CAST('2000-01-01' AS DATETIME) + ndate + 1
GROUP BY
ndate
This will select up to 10,000 dates constructed on the fly, that should be enough for 30 years.
SQL Server has a limitation of 100 recursions per CTE, that's why the inner queries can return up to 100 rows each.
If you need more than 10,000, just add a third CTE qqq(n) and cross-join with it in dates.