How to calculate average in date column - sql

I don't know how to calculate the average age of a column of type date in SQL Server.

You can use datediff() and aggregation. Assuming that your date column is called dt in table mytable, and that you want the average age in years over the whole table, then you would do:
select avg(datediff(year, dt, getdate())) avg_age
from mytable
You can change the first argument to datediff() (which is called the date part), to any other supported value depending on what you actually mean by age; for example datediff(day, dt, getdate()) gives you the difference in days.

First, lets calculate the age in years correctly. See the comments in the code with the understanding that DATEDIFF does NOT calculate age. It only calculates the number of temporal boundaries that it crosses.
--===== Local obviously named variables defined and assigned
DECLARE #StartDT DATETIME = '2019-12-31 23:59:59.997'
,#EndDT DATETIME = '2020-01-01 00:00:00.000'
;
--===== Show the difference in milliseconds between the two date/times
-- Because of the rounding that DATETIME does on 3.3ms resolution, this will return 4ms,
-- which certainly does NOT depict an age of 1 year.
SELECT DATEDIFF(ms,#StartDT,#EndDT)
;
--===== This solution will mistakenly return an age of 1 year for the dates given,
-- which are only about 4ms apart according the SELECT above.
SELECT IncorrectAgeInYears = DATEDIFF(YEAR, #StartDT, #EndDT)
;
--===== This calulates the age in years correctly in T-SQL.
-- If the anniversary data has not yet occurred, 1 year is substracted.
SELECT CorrectAgeInYears = DATEDIFF(yy, #StartDT, #EndDT)
- IIF(DATEADD(yy, DATEDIFF(yy, #StartDT, #EndDT), #StartDT) > #EndDT, 1, 0)
;
Now, lets turn that correct calculation into a Table Valued Function that returns a single scalar value producing a really high speed "Inline Scalar Function".
CREATE FUNCTION [dbo].[AgeInYears]
(
#StartDT DATETIME, --Date of birth or date of manufacture or start date.
#EndDT DATETIME --Usually, GETDATE() or CURRENT_TIMESTAMP but
--can be any date source like a column that has an end date.
)
RETURNS TABLE WITH SCHEMABINDING AS
RETURN
SELECT AgeInYears = DATEDIFF(yy, #StartDT, #EndDT)
- IIF(DATEADD(yy, DATEDIFF(yy, #StartDT, #EndDT), #StartDT) > #EndDT, 1, 0)
;
Then, to Dale's point, let's create a test table and populate it. This one is a little overkill for this problem but it's also useful for a lot of different examples. Don't let the million rows scare you... this runs in just over 2 seconds on my laptop including the Clustered Index creation.
--===== Create and populate a large test table on-the-fly.
-- "SomeInt" has a range of 1 to 50,000 numbers
-- "SomeLetters2" has a range of "AA" to "ZZ"
-- "SomeDecimal has a range of 10.00 to 100.00 numbers
-- "SomeDate" has a range of >=01/01/2000 & <01/01/2020 whole dates
-- "SomeDateTime" has a range of >=01/01/2000 & <01/01/2020 Date/Times
-- "SomeRand" contains the value of RAND just to show it can be done without a loop.
-- "SomeHex9" contains 9 hex digits from NEWID()
-- "SomeFluff" is a fixed width CHAR column just to give the table a little bulk.
SELECT TOP 1000000
SomeInt = ABS(CHECKSUM(NEWID())%50000) + 1
,SomeLetters2 = CHAR(ABS(CHECKSUM(NEWID())%26) + 65)
+ CHAR(ABS(CHECKSUM(NEWID())%26) + 65)
,SomeDecimal = CAST(RAND(CHECKSUM(NEWID())) * 90 + 10 AS DECIMAL(9,2))
,SomeDate = DATEADD(dd, ABS(CHECKSUM(NEWID())%DATEDIFF(dd,'2000','2020')), '2000')
,SomeDateTime = DATEADD(dd, DATEDIFF(dd,0,'2000'), RAND(CHECKSUM(NEWID())) * DATEDIFF(dd,'2000','2020'))
,SomeRand = RAND(CHECKSUM(NEWID())) --CHECKSUM produces an INT and is MUCH faster than conversion to VARBINARY.
,SomeHex9 = RIGHT(NEWID(),9)
,SomeFluff = CONVERT(CHAR(170),'170 CHARACTERS RESERVED') --Just to add a little bulk to the table.
INTO dbo.JBMTest
FROM sys.all_columns ac1 --Cross Join forms up to a 16 million rows
CROSS JOIN sys.all_columns ac2 --Pseudo Cursor
;
GO
--===== Add a non-unique Clustered Index to SomeDateTime for this demo.
CREATE CLUSTERED INDEX IXC_Test ON dbo.JBMTest (SomeDateTime ASC)
;
Now, lets find the average age of those million represented by the SomeDateTime column.
SELECT AvgAgeInYears = AVG(age.AgeInYears )
,RowsCounted = COUNT(*)
FROM dbo.JBMTest tst
CROSS APPLY dbo.AgeInYears(SomeDateTime,GETDATE()) age
;
Results:

Related

tsql grouping with duplication based on variable

I want to create some aggregations from a table but I am not able to figure out a solution.
Example table:
DECLARE #MyTable TABLE(person INT, the_date date, the_value int)
INSERT INTO #MyTable VALUES
(1,'2017-01-01', 10),
(1,'2017-02-01', 5),
(1,'2017-03-01', 5),
(1,'2017-04-01', 10),
(1,'2017-05-01', 2),
(2,'2017-04-01', 10),
(2,'2017-05-01', 10),
(2,'2017-05-01', 0),
(3,'2017-01-01', 2)
For each person existing at that time, I want to average the value for the last x (#months_back) months given some starting date (#start_date):
DECLARE #months_back int, #start_date date
set #months_back = 3
set #start_date = '2017-05-01'
SELECT person, avg(the_value) as avg_the_value
FROM #MyTable
where the_date <= #start_date and the_date >= dateadd(month, -#months_back, #start_date)
group by person
This works. I now want to do the same thing again but skip back some months (#month_skip) from the starting date. Then I want to union those two tables together. Then, I again want to skip back #month_skip months from this date and do the same thing. I want to continue doing this until I have skipped back to some specified date (#min_date).
DECLARE #months_back int, #month_skip int, #start_date date, #min_date date
set #months_back = 3
set #month_skip = 2
set #start_date = '2017-05-01'
set #min_date = '2017-03-01'
Using the above variables and the table #MyTable the result should be:
person | avg_the_value
1 | 5
2 | 6
1 | 6
3 | 2
Only one skip is made here since #min_date is 2 months back but I would like to be able to do multiple skips based on what #min_date is.
This example table is simple but the real one has many more automatically created columns and therefore it is not feasible to use a table variable where I would have to declare the scheme of the resulting table.
I asked a related question Here but did not manage to get any of the answers to work for this problem.
It sounds like what you're trying to do is the following:
Starting with a date (e.g. 2017-05-01), look back #months_back months and define a range of dates. For example, if we go 3 months back, we're defining a range from 2017-02-01 through 2017-05-01.
After we define this range, we go back to our starting date and define a new starting date, going back #month_skip months. For example, with an initial starting date of 2017-05-01, we might skip back 2 months, giving us a new starting date of 2017-03-01.
We take this new starting date, and define a range of corresponding dates (as we did above). This produces the range 2016-12-01 through 2017-03-01.
We repeat this as needed through the minimum date specified, to produce a list of date ranges we want to do calculations for:
2017-03-01 through 2017-05-01
2016-12-01 through 2017-03-01
... etc ...
For each of these periods, look at a person and calculate the average of their value.
The query below should do what is described above: rather than taking a value and iterating back to calculate previous values, we use a numbers table to calculate offsets on an interval, which is used to determine the ending and starting dates for each interval/period. This query was built using SQL Server 2008 R2 and should be compatible with future versions.
/* Table, data, variable declarations */
DECLARE #MyTable TABLE(person INT, the_date date, the_value int)
INSERT INTO #MyTable VALUES
(1,'2017-01-01', 10),
(1,'2017-02-01', 5),
(1,'2017-03-01', 5),
(1,'2017-04-01', 10),
(1,'2017-05-01', 2),
(2,'2017-04-01', 10),
(2,'2017-05-01', 10),
(2,'2017-05-01', 0),
(3,'2017-01-01', 2)
DECLARE #months_back int, #month_skip int, #start_date date, #min_date date
set #months_back = 3
set #month_skip = 2
set #start_date = '2017-05-01'
set #min_date = '2017-01-01'
/* Common table expression to build list of Integers */
/* reference http://www.itprotoday.com/software-development/build-numbers-table-you-need if you want more info */
declare #end_int bigint = 50
; WITH IntegersTableFill (ints) AS
(
SELECT
CAST(0 AS BIGINT) AS 'ints'
UNION ALL
SELECT (T.ints + 1) AS 'ints'
FROM IntegersTableFill T
WHERE ints <= (
CASE
WHEN (#end_int <= 32767) THEN #end_int
ELSE 32767
END
)
)
/* What we're going to do is define a series of periods.
These periods have a start date and an end date, and will simplify grouping
(in place of the calculate-and-union approach)
*/
/* Now, we start defining the periods
#months_Back_start defines the end of the range we need to calculate for.
#month_skip defines the amount of time we have to jump back for each period
*/
/* Using the number table we defined above and the data in our variables, calculate start and end dates */
,periodEndDates as
(
select ints as Period
,DATEADD(month, -(#months_back*ints), #start_date) as endOfPeriod
from IntegersTableFill itf
)
,periodStartDates as
(
select *
,DATEADD(month, -(#month_skip), endOfPeriod) as startOfPeriod
from periodEndDates
)
,finalPeriodData as
(
select (period) as period, startOfPeriod, endOfPeriod from periodStartDates
)
/* Link the entries in our original data to the periods they fall into */
/* NOTE: The join criteria originally specified allows values to fall into multiple periods.
You may want to fix this?
*/
,periodTableJoin as
(
select * from finalPeriodData fpd
inner join #MyTable mt
on mt.the_date >= fpd.startOfPeriod
and mt.the_date <= fpd.endOfPeriod
and mt.the_date >= #min_date
and mt.the_date <= #start_date
)
/* Calculate averages, grouping by period and person */
,periodValueAggregate as
(
select person, avg(the_value) as avg_the_value from
periodTableJoin
group by period, person
)
select * from periodValueAggregate
The method I propose is set-based, not iterative.
(I am not following your problem exactly, but please follow along and we can iron out any discrepancies)
Essentially, you are looking to divide a calendar up in to periods of interest. The periods are all equal in width and are sequential.
For this, I propose you build a calendar table and mark the periods using division as illustrated in the code;
DECLARE #CalStart DATE = '2017-01-01'
,#CalEnd DATE = '2018-01-01'
,#CalWindowSize INT = 2
;WITH Numbers AS
(
SELECT TOP (DATEDIFF(MONTH, #CalStart, #CalEnd)) N = CAST(ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS INT) - 1
FROM syscolumns
)
SELECT CalWindow = N / #CalWindowSize
,CalDate = DATEADD(MONTH, N, #CalStart)
FROM Numbers
Once you have correctly configured the variables, you should have a calendar that represents the windows of interest.
It is then a matter of affixing this calendar to your dataset and grouping by not only the person but the CalWindow too;
DECLARE #MyTable TABLE(person INT, the_date date, the_value int)
INSERT INTO #MyTable VALUES
(1,'2017-01-01', 10),
(1,'2017-02-01', 5),
(1,'2017-03-01', 5),
(1,'2017-04-01', 10),
(1,'2017-05-01', 2),
(2,'2017-04-01', 10),
(2,'2017-05-01', 10),
(2,'2017-05-01', 0),
(3,'2017-01-01', 2)
----------------------------------
-- Build Calendar
----------------------------------
DECLARE #CalStart DATE = '2017-01-01'
,#CalEnd DATE = '2018-01-01'
,#CalWindowSize INT = 2
;WITH Numbers AS
(
SELECT TOP (DATEDIFF(MONTH, #CalStart, #CalEnd)) N = CAST(ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS INT) - 1
FROM syscolumns
)
,Calendar AS
(
SELECT CalWindow = N / #CalWindowSize
,CalDate = DATEADD(MONTH, N, #CalStart)
FROM Numbers
)
SELECT TB.Person
,AVG(TB.the_value)
FROM #MyTable TB
JOIN Calendar CL ON TB.the_date = CL.CalDate
GROUP BY CL.CalWindow, TB.person
Hope I have understood your problem.

Why is this select so slow (SqlServer)?

We have 3 nested selects that are each creating temporary table. The outer two go very fast. But the inner one (below takes about 1/4 second sometimes to execute. It's creating a table with 7 rows, each holding a date:
declare #StartDate datetime
declare #EndDate datetime
select #StartDate = cast(#Weeks_Loop_TheDate as date), #EndDate = cast((#Weeks_Loop_TheDate + 6) as date)
declare #temp3 table
(
TheDate datetime
)
while (#StartDate<=#EndDate)
begin
insert into #temp3
values (#StartDate )
select #StartDate=DATEADD(dd,1,#StartDate)
end
select * from #temp3
The params are set with a DateTime variable so the cast shouldn't be significant. And the populating should be trivial and fast. So any ideas why it's slow?
And is there a better way to do this? We need to get back a result set that is 7 dates in this range.
thanks - dave
Wouldn't this work? Loops/cursors are slow in SQL Server compared to set operations.
DECLARE #StartDate DATE = '2017-05-03';
SELECT DATEADD(DAY, RowNr, #StartDate)
FROM (SELECT ROW_NUMBER () OVER (ORDER BY object_id) - 1 AS RowNr FROM sys.objects) AS T
WHERE T.RowNr < 7;
Subquery will generate a sequence of numbers from 0 to n (amount of objects you have in database, it's always going to be more than 7, and if not, you can just CROSS JOIN inside).
Then just use DATEADD to add these generated numbers.
And finally limit amount of days you want to add in your WHERE clause.
And if you're going to use this quite often, you can wrap it up in a Inline Table-Valued Function.
CREATE FUNCTION dbo.DateTable (
#p1 DATE,
#p2 INT)
RETURNS TABLE
AS RETURN
SELECT DATEADD(DAY, RowNr, #p1) AS TheDate
FROM (SELECT ROW_NUMBER () OVER (ORDER BY object_id) - 1 AS RowNr FROM sys.objects) AS T
WHERE T.RowNr < #p2;
GO
And then query it like that:
SELECT *
FROM dbo.DateTable ('2017-05-03', 7);
Result in both cases:
+------------+
| TheDate |
+------------+
| 2017-05-03 |
| 2017-05-04 |
| 2017-05-05 |
| 2017-05-06 |
| 2017-05-07 |
| 2017-05-08 |
| 2017-05-09 |
+------------+
Yet another useful tool is a Numbers table. It can be created just like that (source: http://dataeducation.com/you-require-a-numbers-table/) :
CREATE TABLE Numbers
(
Number INT NOT NULL,
CONSTRAINT PK_Numbers
PRIMARY KEY CLUSTERED (Number)
WITH FILLFACTOR = 100
)
INSERT INTO Numbers
SELECT
(a.Number * 256) + b.Number AS Number
FROM
(
SELECT number
FROM master..spt_values
WHERE
type = 'P'
AND number <= 255
) a (Number),
(
SELECT number
FROM master..spt_values
WHERE
type = 'P'
AND number <= 255
) b (Number)
GO
Then you would not have to use ROW_NUMBER() and your function could be as follows:
ALTER FUNCTION dbo.DateTable (
#p1 DATE,
#p2 INT)
RETURNS TABLE
AS RETURN
SELECT DATEADD(DAY, Number, #p1) AS TheDate
FROM Numbers
WHERE Number < #p2;
GO
This is going to work like a charm and Numbers table could be reused in many other scenarios where you need a sequence of numbers to do some sort of calculations.
It shouldn't take more than a millisecond to run your script. There must be a server issue that requires investigation.
That said, this operation can be done as a more efficient set-based operation instead of looping. The example below uses a CTE to generate the number sequence. A utility numbers table facilitates set-based processing like this so I suggest you create a permanent table with a sequence of numbers (with number as the primary key) to improve performance further.
DECLARE #StartDate date = #Weeks_Loop_TheDate;
WITH numbers(n) AS (
SELECT ROW_NUMBER() OVER(ORDER BY (SELECT 0)) - 1 FROM (VALUES(0),(0),(0),(0),(0),(0),(0)) AS a(b)
)
SELECT DATEADD(day, n, #StartDate)
FROM numbers
ORDER BY n;
Rather than using a variable such as #temp use a temp table (#) instead. The query analyser doesn't do optimizing well when using #temp.

Filter time values that are a set period either side of a specified time

Given a specified time value and an interval value:
Specified Time: 13:25:00
Interval Value: 00:20:00
How can I filter the following table of values to return times that are the specified Interval either side of the Specified Time.
12:45:24
13:05:00
13:50:30
14:50:32
15:15:10
I want a function or query to check if '13:25:00' has '00:20:00' difference with any of the times in table.
The output should return:
13:05:00
Based on the information you have provided, I assume you want to get values from the list that are the specified period either side of your "special time".
Here's one way to do it using DATEADD:
-- temp table for your sample data
CREATE TABLE #times ( val TIME )
INSERT INTO #times
( val )
VALUES ( '12:45:24' ),
( '13:05:00' ),
( '13:50:30' ),
( '14:50:32' ),
( '15:15:10' )
DECLARE #special_time TIME = '13:25:00'
DECLARE #diff_value TIME = '00:20:00'
-- variable will hold the total number of seconds for your interval
DECLARE #diff_in_seconds INT
-- gets the total number of seconds of your interval -> #diff_value
SELECT #diff_in_seconds = DATEPART(SECOND, #diff_value) + 60
* DATEPART(MINUTE, #diff_value) + 3600 * DATEPART(HOUR, #diff_value)
-- get the values that match the criteria
SELECT *
FROM #times
WHERE val = DATEADD(SECOND, #diff_in_seconds, #special_time)
OR val = DATEADD(SECOND, -( #diff_in_seconds ), #special_time)
DROP TABLE #times
Note that the WHERE clause filters the results by adding and subtracting the difference. The subtraction is achieved by making the #diff_in_seconds negative.
If we are understanding your question correctly, you want all the times that are bigger than 20 minutes from your given (special) time.
To achieve this, just do a select with a where clause that contains a clause looking like this: abs(datediff(minute, tableDate, #specialdate)) > 20
SQLFiddle sample and code example:
declare #specialDate datetime = '1900-01-01 13:25:00'
select *
from SampleData
where abs(datediff(minute, SomeDate, #specialDate)) > 20
Note that I set the dates of the Datetime columns to 1900-01-01 as an obscure reference, adjust according to your settings.
You will need the ABS in the line to make sure that both variants of the resulting datediff are checked (It can either bring back 0, > 0 or < 0)
References:
MSDN: DATEDIFF
MSDN: ABS
Here is a solution:
create table t(t time);
insert into t
values
('12:45:24'),
('13:05:00'),
('13:50:30'),
('14:50:32'),
('15:15:10')
declare #st time = '13:25:00'
declare #dt time = '00:20:00'
select * from t
where abs(datediff(ss, t, #st)) - datediff(ss, '00:00:00', #dt) = 0
abs(datediff(ss, t, #st) will hold difference in seconds between times in table and special time. You compare this difference to difference between 00:00:00 and interval datediff(ss, '00:00:00', #dt)
Output:
t
13:05:00.0000000
Fiddle http://sqlfiddle.com/#!6/05df4/1

Converting YYYYMM format to YYYY-MM-DD in SQL Server

I need to perform a query on a large table that has a datetime column that is indexed.
We need to query the data for a range from a month (at a minimum) to multiple months.
This query would be executed from Cognos TM1 and the input would be a period like YYYYMM. My question is - how to convert the YYYYMM input to a format that can be used to query that table (with the index being used).
Let's say if the input is
From Date: '201312'
To Date: '201312'
then, we need convert the same to 'between 01-12-2013 and 31-12-2013' in the query
Since we need this to be hooked up in Cognos TM1, so would not be able to write a procedure or declare variables (TM1 somehow does not like it).
Thanks in advance for your reply.
I would do something like this:
create procedure dbo.getDataForMonth
#yyyymm char(6) = null
as
--
-- use the current year/month if the year or month is invalid was omitted
--
set #yyyymm = case coalesce(#yyyymm,'')
when '' then convert(char(6),current_timestamp,112)
else #yyyymm
end
--
-- this should throw an exception if the date is invalid
--
declare #dtFrom date = convert(date,#yyyymm+'01') -- 1st of specified month
declare #dtThru date = dateadd(month,1,#dtFrom) -- 1st of next month
--
-- your Big Ugly Query Here
--
select *
from dbo.some_table t
where t.date_of_record >= #dtFrom
and t.date_of_record < #dtThru
--
-- That's about all there is to it.
--
return 0
go
Suppose you are getting this value of YYYYMM in a varchar variable #datefrom .
You can do something like
DECLARE #DateFrom VARCHAR(6) = '201201';
-- Append '01' to any passed string and it will get all
-- records starting from that month in that year
DECLARE #Date VARCHAR(8) = #DateFrom + '01'
-- in your query do something like
SELECT * FROM TableName WHERE DateTimeColumn >= #Date
Passing Datetime in a ansi-standard format i.e YYYYMMDD is a sargable expression and allows sql server to take advantage of indexes defined on that datetime column.
here is an article written by Rob Farley about SARGable functions in SQL Server.
Try this...
declare #startdate date,#endate date
select #startdate =convert(date,left('201312',4)+'-'+right('201312',2)+'-01')
select #endate= DATEADD(d, -1, DATEADD(m, DATEDIFF(m, 0, #startdate) + 1, 0))
select convert(date,#startdate,102) startdate,convert(date,#endate,102) endate
In the datasource of your TM1 Turbo Integrator process, you can use parameters in the SQL query. E.g. you could take this SQL query:
SELECT Col1, Col2
FROM Table
WHERE Col1 = 'Green'
AND Col2 < 30
In TM1, to parameterise this, you would create two parameters e.g. P1 and P2 and put them in the query:
SELECT Col1, Col2
FROM Table
WHERE Col1 = '?P1?'
AND Col2 < ?P2?

How to Determine Values for Missing Months based on Data of Previous Months in T-SQL

I have a set of transactions occurring at specific points in time:
CREATE TABLE Transactions (
TransactionDate Date NOT NULL,
TransactionValue Integer NOT NULL
)
The data might be:
INSERT INTO Transactions (TransactionDate, TransactionValue)
VALUES ('1/1/2009', 1)
INSERT INTO Transactions (TransactionDate, TransactionValue)
VALUES ('3/1/2009', 2)
INSERT INTO Transactions (TransactionDate, TransactionValue)
VALUES ('6/1/2009', 3)
Assuming that the TransactionValue sets some kind of level, I need to know what the level was between the transactions. I need this in the context of a set of T-SQL queries, so it would be best if I could get a result set like this:
Month Value
1/2009 1
2/2009 1
3/2009 2
4/2009 2
5/2009 2
6/2009 3
Note how, for each month, we either get the value specified in the transaction, or we get the most recent non-null value.
My problem is that I have little idea how to do this! I'm only an "intermediate" level SQL Developer, and I don't remember ever seeing anything like this before. Naturally, I could create the data I want in a program, or using cursors, but I'd like to know if there's a better, set-oriented way to do this.
I'm using SQL Server 2008, so if any of the new features will help, I'd like to hear about it.
P.S. If anyone can think of a better way to state this question, or even a better subject line, I'd greatly appreciate it. It took me quite a while to decide that "spread", while lame, was the best I could come up with. "Smear" sounded worse.
I'd start by building a Numbers table holding sequential integers from 1 to a million or so. They come in really handy once you get the hang of it.
For example, here is how to get the 1st of every month in 2008:
select firstOfMonth = dateadd( month, n - 1, '1/1/2008')
from Numbers
where n <= 12;
Now, you can put that together using OUTER APPLY to find the most recent transaction for each date like so:
with Dates as (
select firstOfMonth = dateadd( month, n - 1, '1/1/2008')
from Numbers
where n <= 12
)
select d.firstOfMonth, t.TransactionValue
from Dates d
outer apply (
select top 1 TransactionValue
from Transactions
where TransactionDate <= d.firstOfMonth
order by TransactionDate desc
) t;
This should give you what you're looking for, but you might have to Google around a little to find the best way to create the Numbers table.
here's what i came up with
declare #Transactions table (TransactionDate datetime, TransactionValue int)
declare #MinDate datetime
declare #MaxDate datetime
declare #iDate datetime
declare #Month int
declare #count int
declare #i int
declare #PrevLvl int
insert into #Transactions (TransactionDate, TransactionValue)
select '1/1/09',1
insert into #Transactions (TransactionDate, TransactionValue)
select '3/1/09',2
insert into #Transactions (TransactionDate, TransactionValue)
select '5/1/09',3
select #MinDate = min(TransactionDate) from #Transactions
select #MaxDate = max(TransactionDate) from #Transactions
set #count=datediff(mm,#MinDate,#MaxDate)
set #i=1
set #iDate=#MinDate
while (#i<=#count)
begin
set #iDate=dateadd(mm,1,#iDate)
if (select count(*) from #Transactions where TransactionDate=#iDate) < 1
begin
select #PrevLvl = TransactionValue from #Transactions where TransactionDate=dateadd(mm,-1,#iDate)
insert into #Transactions (TransactionDate, TransactionValue)
select #iDate, #prevLvl
end
set #i=#i+1
end
select *
from #Transactions
order by TransactionDate
To do it in a set-based way, you need sets for all of your data or information. In this case there's the overlooked data of "What months are there?" It's very useful to have a "Calendar" table as well as a "Number" table in databases as utility tables.
Here's a solution using one of these methods. The first bit of code sets up your calendar table. You can fill it using a cursor or manually or whatever and you can limit it to whatever date range is needed for your business (back to 1900-01-01 or just back to 1970-01-01 and as far into the future as you want). You can also add any other columns that are useful for your business.
CREATE TABLE dbo.Calendar
(
date DATETIME NOT NULL,
is_holiday BIT NOT NULL,
CONSTRAINT PK_Calendar PRIMARY KEY CLUSTERED (date)
)
INSERT INTO dbo.Calendar (date, is_holiday) VALUES ('2009-01-01', 1) -- New Year
INSERT INTO dbo.Calendar (date, is_holiday) VALUES ('2009-01-02', 1)
...
Now, using this table your question becomes trivial:
SELECT
CAST(MONTH(date) AS VARCHAR) + '/' + CAST(YEAR(date) AS VARCHAR) AS [Month],
T1.TransactionValue AS [Value]
FROM
dbo.Calendar C
LEFT OUTER JOIN dbo.Transactions T1 ON
T1.TransactionDate <= C.date
LEFT OUTER JOIN dbo.Transactions T2 ON
T2.TransactionDate > T1.TransactionDate AND
T2.TransactionDate <= C.date
WHERE
DAY(C.date) = 1 AND
T2.TransactionDate IS NULL AND
C.date BETWEEN '2009-01-01' AND '2009-12-31' -- You can use whatever range you want
John Gibb posted a fine answer, already accepted, but I wanted to expand on it a bit to:
eliminate the one year limitation,
expose the date range in a more
explicit manner, and
eliminate the need for a separate
numbers table.
This slight variation uses a recursive common table expression to establish the set of Dates representing the first of each month on or after from and to dates defined in DateRange. Note the use of the MAXRECURSION option to prevent a stack overflow (!); adjust as necessary to accommodate the maximum number of months expected. Also, consider adding alternative Dates assembly logic to support weeks, quarters, even day-to-day.
with
DateRange(FromDate, ToDate) as (
select
Cast('11/1/2008' as DateTime),
Cast('2/15/2010' as DateTime)
),
Dates(Date) as (
select
Case Day(FromDate)
When 1 Then FromDate
Else DateAdd(month, 1, DateAdd(month, ((Year(FromDate)-1900)*12)+Month(FromDate)-1, 0))
End
from DateRange
union all
select DateAdd(month, 1, Date)
from Dates
where Date < (select ToDate from DateRange)
)
select
d.Date, t.TransactionValue
from Dates d
outer apply (
select top 1 TransactionValue
from Transactions
where TransactionDate <= d.Date
order by TransactionDate desc
) t
option (maxrecursion 120);
If you do this type of analysis often, you might be interested in this SQL Server function I put together for exactly this purpose:
if exists (select * from dbo.sysobjects where name = 'fn_daterange') drop function fn_daterange;
go
create function fn_daterange
(
#MinDate as datetime,
#MaxDate as datetime,
#intval as datetime
)
returns table
--**************************************************************************
-- Procedure: fn_daterange()
-- Author: Ron Savage
-- Date: 12/16/2008
--
-- Description:
-- This function takes a starting and ending date and an interval, then
-- returns a table of all the dates in that range at the specified interval.
--
-- Change History:
-- Date Init. Description
-- 12/16/2008 RS Created.
-- **************************************************************************
as
return
WITH times (startdate, enddate, intervl) AS
(
SELECT #MinDate as startdate, #MinDate + #intval - .0000001 as enddate, #intval as intervl
UNION ALL
SELECT startdate + intervl as startdate, enddate + intervl as enddate, intervl as intervl
FROM times
WHERE startdate + intervl <= #MaxDate
)
select startdate, enddate from times;
go
it was an answer to this question, which also has some sample output from it.
I don't have access to BOL from my phone so this is a rough guide...
First, you need to generate the missing rows for the months you have no data. You can either use a OUTER join to a fixed table or temp table with the timespan you want or from a programmatically created dataset (stored proc or suchlike)
Second, you should look at the new SQL 2008 'analytic' functions, like MAX(value) OVER ( partition clause ) to get the previous value.
(I KNOW Oracle can do this 'cause I needed it to calculate compounded interest calcs between transaction dates - same problem really)
Hope this points you in the right direction...
(Avoid throwing it into a temp table and cursoring over it. Too crude!!!)
-----Alternative way------
select
d.firstOfMonth,
MONTH(d.firstOfMonth) as Mon,
YEAR(d.firstOfMonth) as Yr,
t.TransactionValue
from (
select
dateadd( month, inMonths - 1, '1/1/2009') as firstOfMonth
from (
values (1), (2), (3), (4), (5), (7), (8), (9), (10), (11), (12)
) Dates(inMonths)
) d
outer apply (
select top 1 TransactionValue
from Transactions
where TransactionDate <= d.firstOfMonth
order by TransactionDate desc
) t