Group by date and count of entries - sql

I'll make it short, the table looks like this:
| id (int) | registerDate (DATETIME)
|----------|-----------------
| 1 | 2014-07-29 12:00:00
| 2 | 2014-08-01 12:00:00
| 3 | 2014-08-01 12:00:00
| 4 | 2014-08-01 12:00:00
| 5 | 2014-08-02 12:00:00
| 6 | 2014-08-02 12:00:00
| 7 | 2014-08-04 12:00:00
If today is 2014-08-05, I want results like this:
| registerDate (DATETIME) | count (int)
| 2014-08-04 | 1
| 2014-08-03 | 0
| 2014-08-02 | 2
| 2014-08-01 | 1
| 2014-07-31 | 0
| 2014-07-30 | 0
| 2014-07-29 | 1
So I want the count of registered users in the past week (daily).
I tried to find it out on google (unsuccessfully) - however, I hope you can help.

SELECT registerDate, count(registerDate) FROM [TABLE] WHERE
registerDate between (GETDATE()-7) and GETDATE()
group by registerDate
order by registerDate desc
This will take a table like:
2 |1905-06-26 00:00:00.000
4 |2014-08-03 00:00:00.000
5 |2014-08-02 00:00:00.000
1 |2014-08-01 00:00:00.000
3 |2014-07-01 00:00:00.000
6 |2010-07-01 00:00:00.000
7 |2015-07-01 00:00:00.000
8 |2014-08-28 00:00:00.000
9 |2014-08-26 00:00:00.000
10 |2014-08-26 00:00:00.000
And create:
2014-08-28 00:00:00.000 | 1
2014-08-26 00:00:00.000 | 2
The problem with this is it doesn't show the days that are not in the table.
Give me a little more time, I'll have an updated version.
EDIT:
Now the more complex one...
-- Declare how far back you want to go
DECLARE #DAYSBACK int = 6
-- Select into a temptable
select CONVERT(date, registerDate) as RegDate, count(registerDate) as DateCount
INTO #temptable
from Temp where registerDate between (GETDATE()-6) and GETDATE()
group by registerDate order by registerDate desc
-- Check to see if exists if not, insert row
WHILE #DAYSBACK >= 0 BEGIN
IF NOT EXISTS (select top 1 1 from #temptable
where RegDate= CONVERT(date, (GETDATE()-#DAYSBACK))
group by RegDate)
INSERT INTO #temptable values ((GETDATE()-#DAYSBACK), 0)
SET #DAYSBACK = #DAYSBACK -1
END
-- Select what you want
select * from #temptable order by RegDate desc
-- Drop the table you created.
DROP TABLE #temptable
Using the same table as above, it will output:
Register Date | Date Count
--------------------------
2014-08-28 | 1
2014-08-27 | 0
2014-08-26 | 2
2014-08-25 | 0
2014-08-24 | 0
2014-08-23 | 0
2014-08-22 | 0

Try something like this:
select registerDate = convert(date,t.registerDate) ,
registrations = count(*)
from dbo.my_special_registration_table t
where t.registrationDate >= dateadd(day,-7,convert(date,getdate()))
group by convert(date,t.registerDate)
order by 1
If you try to filter out registrations older than 7 days using something like datediff():
where datediff(day,t.registrationDate,getdate()) <= 7
you turned the column registrationDate into an expression. As a result the query optimizer can't make use of any indices that might apply, thus forcing a table scan. If you table is large, performance is likely to be ... suboptimal.

Related

postgresql query to generate report with multiple columns

I'm having a customer transaction table in postgresql db with the below columns
transactionId (primary)| customerId(int8)| transactionDate (timestamp)
1 2 2020-02-14
2 3 2020-01-08
3 1 2020-02-06
4 2 2020-02-13
5 2 2020-03-24
Need to build a query to create the report of the below
CustomerId| FirstTransaction| TotalTransactions| Transactions/Week| RecentTransactions
1 2020-02-06 1 1 2020-02-06
3 2020-01-08 1 1 2020-01-08
2 2020-02-13 3 2 2020-03-24
When the customer first started at first, total transactions, Frequency per week, Recency of last?
and the report should consider(contain) last 3 months records only.
Try the following, here is the demo.
with cte as
(
select
*,
count(*) over (partition by customerId) as totalTransactions,
1 + floor((extract(day from transactionDate) - 1) / 7) as transactionsWeek
from myTable
where transactionDate >= '2020-01-01'
and transactionDate <= '2020-03-31'
)
select
customerId,
min(transactionDate) as firstTransaction,
max(totalTransactions) as totalTransactions,
max(transactionDate) as recentTransactions,
(ceil(avg(totalTransactions)/count(distinct transactionsWeek))::int) as "Transactions/Week"
from cte
group by
customerId
order by
customerId
Output:
| customerid | firsttransaction | totaltransactions | recenttransactions | Transactions/Week |
| ---------- | ------------------------ | ----------------- | ------------------------ | ----------------- |
| 1 | 2020-02-06 | 1 | 2020-02-06 | 1 |
| 2 | 2020-02-13 | 3 | 2020-03-24 | 2 |
| 3 | 2020-01-08 | 1 | 2020-01-08 | 1 |
for the last three months you can also use following in where condition
transactionDate > CURRENT_DATE - INTERVAL '3 months'

Repeating ID based on

I have a very simple requirement but I'm struggling to find a way around this.
I have a very simple query:
SELECT
ServiceCode,
StartDate,
Available,
Nights,
BookingID
FROM #tmpAvailability
LEFT JOIN vwRSBooking B
ON B.Depart = A.StartDate
AND B.ServiceCode = A.SupplierCode
AND B.StatusID IN (2640, 2621)
ORDER BY StartDate;
Made up of 2 tables
#tmpAvailability which consists of the following fields:
SupplierCode
StartDate
Available
vwRSBooking which consists of the following fields
BookingID
DepartDate
Code
Nights
StatusID
Departure and startdate can be joined to link the first day, and the servicecode and suppliercode can be joined to make sure that the availability is linked to the same supplier.
Which produces an output like this:
Code | Dates | Available | Nights | BookingID
TEST | 2018-01-04 | 1 | NULL | NULL
TEST | 2018-01-05 | 1 | NULL | NULL
TEST | 2018-01-06 | 0 | 4 | 123456
TEST | 2018-01-07 | 0 | NULL | NULL
TEST | 2018-01-08 | 0 | NULL | NULL
TEST | 2018-01-09 | 0 | NULL | NULL
TEST | 2018-01-10 | 1 | NULL | NULL
TEST | 2018-01-11 | 1 | NULL | NULL
TEST | 2018-01-12 | 1 | NULL | NULL
TEST | 2018-01-13 | 0 | NULL | 234567
TEST | 2018-01-14 | 0 | NULL | NULL
TEST | 2018-01-15 | 0 | NULL | NULL
What I need is when the BookingID in for 4 days that the bookingID and the nights are spread across those days, for example:
Code | Dates | Available | Nights | BookingID
TEST | 2018-01-04 | 1 | NULL | NULL
TEST | 2018-01-05 | 1 | NULL | NULL
TEST | 2018-01-06 | 0 | 4 | 123456
TEST | 2018-01-07 | 0 | 4 | 123456
TEST | 2018-01-08 | 0 | 4 | 123456
TEST | 2018-01-09 | 0 | 4 | 123456
TEST | 2018-01-10 | 1 | NULL | NULL
TEST | 2018-01-11 | 1 | NULL | NULL
TEST | 2018-01-12 | 1 | NULL | NULL
TEST | 2018-01-13 | 0 | 3 | 234567
TEST | 2018-01-14 | 0 | 3 | 234567
TEST | 2018-01-15 | 0 | 3 | 234567
TEST | 2018-01-16 | 1 | NULL | NULL
If anyone has any ideas on how to solve it would be most appreciated.
Andrew
You could replace your vwRSBooking with another view which uses a CTE to obtain all the dates the booking covers. Then use the view's coverdate for joining to the #tmpAvailability table:
CREATE VIEW vwRSBookingFull
AS
WITH cte ( bookingid, nights, depart, code, coverdate)
AS (SELECT bookingid,
nights,
depart,
code,
depart
FROM vwRSBooking
UNION ALL
SELECT c.bookingid,
c.nights,
c.depart,
c.code,
DATEADD(d, 1, c.coverdate)
FROM cte c
WHERE DATEDIFF(d, c.depart, c.coverdate) < (c.nights - 1))
SELECT c.bookingid,
c.nights,
c.depart,
c.code,
c.coverdate
FROM cte c
GO
You will need a calendar table with all the dates in the date range your dates may fall into. For this example, I build one for January 2018. We can then join onto this table to create the additional rows.
Here is the sample code I used. You can see it at SQL Fiddle.
CREATE TABLE code (
code varchar(max),
dates date,
available int,
nights int,
bookingid int
)
INSERT INTO code VALUES
('TEST','2018-01-04','1',NULL,NULL),
('TEST','2018-01-05','1',NULL,NULL),
('TEST','2018-01-06','0',4,123456),
('TEST','2018-01-07','0',NULL,NULL),
('TEST','2018-01-08','0',NULL,NULL),
('TEST','2018-01-09','0',NULL,NULL),
('TEST','2018-01-10','1',NULL,NULL),
('TEST','2018-01-11','1',NULL,NULL),
('TEST','2018-01-12','1',NULL,NULL),
('TEST','2018-01-13','0',3,234567),
('TEST','2018-01-14','0',NULL,NULL),
('TEST','2018-01-15','0',NULL,NULL)
CREATE TABLE dates (
dates date
)
INSERT INTO dates VALUES
('2018-01-01'),('2018-01-02'),('2018-01-03'),('2018-01-04'),('2018-01-05'),('2018-01-06'),('2018-01-07'),('2018-01-08'),('2018-01-09'),('2018-01-10'),('2018-01-11'),('2018-01-12'),('2018-01-13'),('2018-01-14'),('2018-01-15'),('2018-01-16'),('2018-01-17'),('2018-01-18'),('2018-01-19'),('2018-01-20'),('2018-01-21'),('2018-01-22'),('2018-01-23'),('2018-01-24'),('2018-01-25'),('2018-01-26'),('2018-01-27'),('2018-01-28'),('2018-01-29'),('2018-01-30'),('2018-01-31')
Here is the query based on this dataset:
SELECT
code.code,
dates.dates,
code.available,
code.nights,
code.bookingid
FROM code
LEFT JOIN dates ON
dates.dates >= code.dates
AND dates.dates < DATEADD(DAY,nights,code.dates)
Edit: Here is an example using your initial query as a subquery to join your result set onto the dates table if you want a copy & paste. Still requires creating the dates table.
SELECT
ServiceCode,
StartDate,
Available,
Nights,
BookingID
FROM (
SELECT
ServiceCode,
StartDate,
Available,
Nights,
BookingID
FROM #tmpAvailability
LEFT JOIN vwRSBooking B
ON B.Depart = A.StartDate
AND B.ServiceCode = A.SupplierCode
AND B.StatusID IN (2640, 2621)
) code
LEFT JOIN dates ON
dates.dates >= code.dates
AND dates.dates < DATEADD(DAY,nights,code.dates)
ORDER BY StartDate;

TSQL query help structuring results

I have a table with the following columns:
timestamp | value | desc
example of the data:
2014-01-27 10:00:00.000 | 100 | 101
2014-01-27 10:00:00.000 | 105 | 101
2014-01-27 11:00:00.000 | 160 | 101
2014-01-27 12:00:00.000 | 200 | 101
...
...
2014-01-28 10:00:00.000 | 226 | 101
2014-01-28 10:00:00.000 | 325 | 101
2014-01-28 11:00:00.000 | 145 | 101
what I would like to obtain is a grouping by the hour part but without merging the period interval.
So that the result will be like this (in the select I will pass a date interval and a condition on the description like desc = '101':
Structure:
hour | count
Data:
10 | 2 (referring to the 20140127)
11 | 1 (referring to the 20140127)
12 | 1 (referring to the 20140127)
...
...
10 | 2 (referring to the 20140128)
11 | 1 (referring to the 20140128)
I thought about using a cursor but I was wondering if it is possible to achieve this result without it.
I'm using SQL server 2012 SP1.
Thanks for your attention.
Bye,
F.
Try this:-
SELECT Count(*) AS [Count],
Datepart(hour, timestamp) AS [Hour]
FROM yourtable
GROUP BY CONVERT(DATE, timestamp),
Datepart(hour, timestamp)
ORDER BY CONVERT(DATE, timestamp)
You may use this. This should work
SELECT DATEPART(hh,timestamp), COUNT(*)
FROM tablename
GROUP BY
DATEPART(hh,timestamp),
DATETIMEFROMPARTS (YEAR(timestamp),MONTH(timestamp),DAY(timestamp),0,0,0,0,0),
desc HAVING desc ='yourvalue'

Finding simultaneous events in a database between times

I have a database that stores phone call records. Each phone call record has a start time and an end time. I want to find out what is the maximum amount of phone calls that are simultaneously happening in order to know if we have exceed the amount of available phone lines in our phone bank. How could I go about solving this problem?
Disclaimer: I'm writing my answer based on the (excelent) following post:
https://www.itprotoday.com/sql-server/calculating-concurrent-sessions-part-3 (Part1 and 2 are recomended also)
The first thing to understand here with that problem is that most of the current solutions found in the internet can have basically two issues
The result is not the correct answer (for example if range A overlaps with B and C but B dosen't overlaps with C they count as 3 overlapping ranges).
The way to compute it is very innefficient (because is O(n^2) and / or they cicle for each second in the period)
The common performance problem in solutions like the proposed by Unreasons is a cuadratic solution, for each call you need to check all the other calls if they are overlaped.
there is an algoritmical linear common solution that is list all the "events" (start call and end call) ordered by date, and add 1 for a start and substract 1 for a hang-up, and remember the max. That can be implemented easily with a cursor (solution proposed by Hafhor seems to be in that way) but cursors are not the most efficient ways to solve problems.
The referenced article has excelent examples, differnt solutions, performance comparison of them. The proposed solution is:
WITH C1 AS
(
SELECT starttime AS ts, +1 AS TYPE,
ROW_NUMBER() OVER(ORDER BY starttime) AS start_ordinal
FROM Calls
UNION ALL
SELECT endtime, -1, NULL
FROM Calls
),
C2 AS
(
SELECT *,
ROW_NUMBER() OVER( ORDER BY ts, TYPE) AS start_or_end_ordinal
FROM C1
)
SELECT MAX(2 * start_ordinal - start_or_end_ordinal) AS mx
FROM C2
WHERE TYPE = 1
Explanation
suppose this set of data
+-------------------------+-------------------------+
| starttime | endtime |
+-------------------------+-------------------------+
| 2009-01-01 00:02:10.000 | 2009-01-01 00:05:24.000 |
| 2009-01-01 00:02:19.000 | 2009-01-01 00:02:35.000 |
| 2009-01-01 00:02:57.000 | 2009-01-01 00:04:04.000 |
| 2009-01-01 00:04:12.000 | 2009-01-01 00:04:52.000 |
+-------------------------+-------------------------+
This is a way to implement with a query the same idea, adding 1 for each starting of a call and substracting 1 for each ending.
SELECT starttime AS ts, +1 AS TYPE,
ROW_NUMBER() OVER(ORDER BY starttime) AS start_ordinal
FROM Calls
this part of the C1 CTE will take each starttime of each call and number it
+-------------------------+------+---------------+
| ts | TYPE | start_ordinal |
+-------------------------+------+---------------+
| 2009-01-01 00:02:10.000 | 1 | 1 |
| 2009-01-01 00:02:19.000 | 1 | 2 |
| 2009-01-01 00:02:57.000 | 1 | 3 |
| 2009-01-01 00:04:12.000 | 1 | 4 |
+-------------------------+------+---------------+
Now this code
SELECT endtime, -1, NULL
FROM Calls
Will generate all the "endtimes" without row numbering
+-------------------------+----+------+
| endtime | | |
+-------------------------+----+------+
| 2009-01-01 00:02:35.000 | -1 | NULL |
| 2009-01-01 00:04:04.000 | -1 | NULL |
| 2009-01-01 00:04:52.000 | -1 | NULL |
| 2009-01-01 00:05:24.000 | -1 | NULL |
+-------------------------+----+------+
Now making the UNION to have the full C1 CTE definition, you will have both tables mixed
+-------------------------+------+---------------+
| ts | TYPE | start_ordinal |
+-------------------------+------+---------------+
| 2009-01-01 00:02:10.000 | 1 | 1 |
| 2009-01-01 00:02:19.000 | 1 | 2 |
| 2009-01-01 00:02:57.000 | 1 | 3 |
| 2009-01-01 00:04:12.000 | 1 | 4 |
| 2009-01-01 00:02:35.000 | -1 | NULL |
| 2009-01-01 00:04:04.000 | -1 | NULL |
| 2009-01-01 00:04:52.000 | -1 | NULL |
| 2009-01-01 00:05:24.000 | -1 | NULL |
+-------------------------+------+---------------+
C2 is computed sorting and numbering C1 with a new column
C2 AS
(
SELECT *,
ROW_NUMBER() OVER( ORDER BY ts, TYPE) AS start_or_end_ordinal
FROM C1
)
+-------------------------+------+-------+--------------+
| ts | TYPE | start | start_or_end |
+-------------------------+------+-------+--------------+
| 2009-01-01 00:02:10.000 | 1 | 1 | 1 |
| 2009-01-01 00:02:19.000 | 1 | 2 | 2 |
| 2009-01-01 00:02:35.000 | -1 | NULL | 3 |
| 2009-01-01 00:02:57.000 | 1 | 3 | 4 |
| 2009-01-01 00:04:04.000 | -1 | NULL | 5 |
| 2009-01-01 00:04:12.000 | 1 | 4 | 6 |
| 2009-01-01 00:04:52.000 | -1 | NULL | 7 |
| 2009-01-01 00:05:24.000 | -1 | NULL | 8 |
+-------------------------+------+-------+--------------+
And there is where the magic occurs, at any time the result of #start - #ends is the amount of cocurrent calls at this moment.
for each Type = 1 (start event) we have the #start value in the 3rd column. and we also have the #start + #end (in the 4th column)
#start_or_end = #start + #end
#end = (#start_or_end - #start)
#start - #end = #start - (#start_or_end - #start)
#start - #end = 2 * #start - #start_or_end
so in SQL:
SELECT MAX(2 * start_ordinal - start_or_end_ordinal) AS mx
FROM C2
WHERE TYPE = 1
In this case with the prposed set of calls, the result is 2.
In the proposed article, there is a little improvment to have a grouped result by for example a service or a "phone company" or "phone central" and this idea can also be used to group for example by time slot and have the maximum concurrency hour by hour in a given day.
Given the fact that the maximum number of connections is going to be a StartTime points, you can
SELECT TOP 1 count(*) as CountSimultaneous
FROM PhoneCalls T1, PhoneCalls T2
WHERE
T1.StartTime between T2.StartTime and T2.EndTime
GROUP BY
T1.CallID
ORDER BY CountSimultaneous DESC
The query will return for each call the number of simultaneous calls. Either order them descending and select first one or SELECT MAX(CountSimultaneous) from the above (as subquery without ordering and without TOP).
try this:
DECLARE #Calls table (callid int identity(1,1), starttime datetime, endtime datetime)
INSERT #Calls (starttime,endtime) values ('6/12/2010 10:10am','6/12/2010 10:15am')
INSERT #Calls (starttime,endtime) values ('6/12/2010 11:10am','6/12/2010 10:25am')
INSERT #Calls (starttime,endtime) values ('6/12/2010 12:10am','6/12/2010 01:15pm')
INSERT #Calls (starttime,endtime) values ('6/12/2010 11:10am','6/12/2010 10:35am')
INSERT #Calls (starttime,endtime) values ('6/12/2010 12:10am','6/12/2010 12:15am')
INSERT #Calls (starttime,endtime) values ('6/12/2010 10:10am','6/12/2010 10:15am')
DECLARE #StartDate datetime
,#EndDate datetime
SELECT #StartDate='6/12/2010'
,#EndDate='6/13/2010'
;with AllDates AS
(
SELECT #StartDate AS DateOf
UNION ALL
SELECT DATEADD(second,1,DateOf) AS DateOf
FROM AllDates
WHERE DateOf<#EndDate
)
SELECT
a.DateOf,COUNT(c.callid) AS CountOfCalls
FROM AllDates a
INNER JOIN #Calls c ON a.DateOf>=c.starttime and a.DateOf<=c.endtime
GROUP BY a.DateOf
ORDER BY 2 DESC
OPTION (MAXRECURSION 0)
OUTPUT:
DateOf CountOfCalls
----------------------- ------------
2010-06-12 10:10:00.000 3
2010-06-12 10:10:01.000 3
2010-06-12 10:10:02.000 3
2010-06-12 10:10:03.000 3
2010-06-12 10:10:04.000 3
2010-06-12 10:10:05.000 3
2010-06-12 10:10:06.000 3
2010-06-12 10:10:07.000 3
2010-06-12 10:10:08.000 3
2010-06-12 10:10:09.000 3
2010-06-12 10:10:10.000 3
2010-06-12 10:10:11.000 3
2010-06-12 10:10:12.000 3
2010-06-12 10:10:13.000 3
2010-06-12 10:10:14.000 3
2010-06-12 10:10:15.000 3
2010-06-12 10:10:16.000 3
2010-06-12 10:10:17.000 3
2010-06-12 10:10:18.000 3
2010-06-12 10:10:19.000 3
2010-06-12 10:10:20.000 3
2010-06-12 10:10:21.000 3
2010-06-12 10:10:22.000 3
2010-06-12 10:10:23.000 3
2010-06-12 10:10:24.000 3
2010-06-12 10:10:25.000 3
2010-06-12 10:10:26.000 3
2010-06-12 10:10:27.000 3
....
add a TOP 1 or put this query in a derived table and further aggergate it if necessary.
SELECT COUNT(*) FROM calls
WHERE '2010-06-15 15:00:00' BETWEEN calls.starttime AND calls.endtime
and repeat this for every second.
The only practical method I can think of is as follows:
Split the period you want to analyze in arbitrary "buckets", say, 24 1-hour buckets over the day. For each Bucket count how many calls either started or finished between the start or the end of the interval
Note that the 1-hour limit is not a hard-and-fast rule. You could make this shorter or longer, depending on how precise you want the calculation to be.
You could make the actual "length" of the bucket a function of the average call duration.
So, let's assume that your average call is 3 minutes. If it is not too expensive in terms of calculations, use buckets that are 3 times longer than your average call (9 minutes) this should be granular enough to give precise results.
-- assuming calls table with columns starttime and endtime
declare #s datetime, #e datetime;
declare #t table(d datetime);
declare c cursor for select starttime,endtime from calls order by starttime;
open c
while(1=1) begin
fetch next from c into #s,#e
if ##FETCH_STATUS<>0 break;
update top(1) #t set d=#e where d<=#s;
if ##ROWCOUNT=0 insert #t(d) values(#e);
end
close c
deallocate c
select COUNT(*) as MaxConcurrentCalls from #t

Summarising (permanently) data in a SQL table

Geetings, Stackers.
I have a huge number of data-points in a SQL table, and I want to summarise them in a way reminiscent of RRD.
Assuming a table such as
ID | ENTITY_ID | SCORE_DATE | SCORE | SOME_OTHER_DATA
----+-----------+------------+-------+-----------------
1 | A00000001 | 01/01/2010 | 100 | some data
2 | A00000002 | 01/01/2010 | 105 | more data
3 | A00000003 | 01/01/2010 | 104 | various text
... | ......... | .......... | ..... | ...
... | A00009999 | 01/01/2010 | 101 |
... | A00000001 | 02/01/2010 | 104 |
... | A00000002 | 02/01/2010 | 119 |
... | A00000003 | 02/01/2010 | 119 |
... | ......... | .......... | ..... |
... | A00009999 | 02/01/2010 | 101 | arbitrary data
... | ......... | .......... | ..... | ...
... | A00000001 | 01/02/2010 | 104 |
... | A00000002 | 01/02/2010 | 119 |
... | A00000003 | 01/01/2010 | 119 |
I want to end up with one record per entity, per month:
ID | ENTITY_ID | SCORE_DATE | SCORE |
----+-----------+------------+-------+
... | A00000001 | 01/01/2010 | 100 |
... | A00000002 | 01/01/2010 | 105 |
... | A00000003 | 01/01/2010 | 104 |
... | A00000001 | 01/02/2010 | 100 |
... | A00000002 | 01/02/2010 | 105 |
... | A00000003 | 01/02/2010 | 104 |
(I Don't care about the SOME_OTHER_DATA - I'll pick something - either the first or last record probably.)
What's an easy way of doing this on a regular basis, so that anything in the last calendar month is summarised in this way?
At the moment my plan is kind of:
For each EntityID
For each month
Find average score for all records in given month
Update first record with results of previous step
Delete all records that aren't the first
I can't think of a neat way of doing it though, that doesn't involve lots of updates and iteration.
This can either be done in a SQL Stored Procedure, or it can be incorporated into the .Net app that's generating this data, so the solution doesn't really need to be "one big SQL script", but can be :)
(SQL-2005)
This will give you averages for all of your data:
select ENTITY_ID, year(SCORE_DATE) as Year, month(SCORE_DATE) as Month, avg(SCORE) as Avg
from MyTable
group by ENTITY_ID, year(SCORE_DATE), month(SCORE_DATE)
To restrict to a given month, e.g., last February, you can do:
select ENTITY_ID, year(SCORE_DATE) as Year, month(SCORE_DATE) as Month, avg(SCORE) as Avg
from MyTable
where year(SCORE_DATE) = 2010 and month(SCORE_DATE) = 2
group by ENTITY_ID, year(SCORE_DATE), month(SCORE_DATE)
This version would actually perform better, but the parameters are a little less friendly to deal with:
select ENTITY_ID, year(SCORE_DATE) as Year, month(SCORE_DATE) as Month, avg(SCORE) as Avg
from MyTable
where SCORE_DATE >= '2/1/2010' and SCORE_DATE < '3/1/2010'
group by ENTITY_ID, year(SCORE_DATE), month(SCORE_DATE)
If you want a query that always returns last month's data, you can do this:
select ENTITY_ID, year(SCORE_DATE) as Year, month(SCORE_DATE) as Month, avg(SCORE) as Avg
from MyTable
where year(SCORE_DATE) = year(dateadd(month, -1, getdate())) and month(dateadd(month, -1, getdate())) = 2
group by ENTITY_ID, year(SCORE_DATE), month(SCORE_DATE)
A better-performing version:
select ENTITY_ID, year(SCORE_DATE) as Year, month(SCORE_DATE) as Month, avg(SCORE) as Avg
from MyTable
where SCORE_DATE >= dateadd(month, ((year(getdate()) - 1900) * 12) + month(getdate())-2, 0)
and SCORE_DATE < dateadd(month, ((year(getdate()) - 1900) * 12) + month(getdate())-1, 0)
group by ENTITY_ID, year(SCORE_DATE), month(SCORE_DATE)
Give this a try:
--I am using #table variables here, you will want to use your actual table in place of #YourTable and a #Temptable for #YourTable2, with a PK on ID
SET NOCOUNT ON
DECLARE #YourTable table (ID int,ENTITY_ID char(9),SCORE_DATE datetime,SCORE int ,SOME_OTHER_DATA varchar(100))
DECLARE #YourTable2 table (ID int)
INSERT INTO #YourTable VALUES (1 , 'A00000001','01/01/2010',100,'some data')
INSERT INTO #YourTable VALUES (2 , 'A00000002','01/01/2010',105,'more data')
INSERT INTO #YourTable VALUES (3 , 'A00000003','01/01/2010',104,'various text')
INSERT INTO #YourTable VALUES (4 , 'A00009999','01/01/2010',101,null)
INSERT INTO #YourTable VALUES (5 , 'A00000001','02/01/2010',104,null)
INSERT INTO #YourTable VALUES (6 , 'A00000002','02/01/2010',119,null)
INSERT INTO #YourTable VALUES (7 , 'A00000003','02/01/2010',119,null)
INSERT INTO #YourTable VALUES (8 , 'A00009999','02/01/2010',101,'arbitrary data')
INSERT INTO #YourTable VALUES (9 , 'A00000001','01/02/2010',104,null)
INSERT INTO #YourTable VALUES (10, 'A00000002','01/02/2010',119,null)
INSERT INTO #YourTable VALUES (11, 'A00000003','01/01/2010',119,null)
SET NOCOUNT OFF
SELECT 'BEFORE',* FROM #YourTable ORDER BY ENTITY_ID,SCORE_DATE
UPDATE y
SET SCORE=dt_a.AvgScore
OUTPUT INSERTED.ID --capture all updated rows
INTO #YourTable2
FROM #YourTable y
INNER JOIN (SELECT --get avg score for each ENTITY_ID per month
ENTITY_ID
,AVG(SCORE) as AvgScore
, DATEADD(month,DATEDIFF(month,0,SCORE_DATE),0) AS MonthOf,DATEADD(month,1,DATEADD(month,DATEDIFF(month,0,SCORE_DATE),0)) AS MonthNext
FROM #YourTable
--group by 1st day of current month and 1st day of next month
--so an index can be used when joining derived table to UPDATE table
GROUP BY ENTITY_ID, DATEADD(month,DATEDIFF(month,0,SCORE_DATE),0),DATEADD(month,1,DATEADD(month,DATEDIFF(month,0,SCORE_DATE),0))
) dt_a ON y.ENTITY_ID=dt_a.ENTITY_ID AND y.SCORE_DATE>=dt_a.MonthOf AND y.SCORE_DATE<dt_a.MonthNext
INNER JOIN (SELECT--get first row for each ENTITY_ID per month
ID,ENTITY_ID,SCORE_DATE,SCORE
FROM (SELECT
ID,ENTITY_ID,SCORE_DATE,SCORE
,ROW_NUMBER() OVER(PARTITION BY ENTITY_ID,DATEADD(month,DATEDIFF(month,0,SCORE_DATE),0) ORDER BY ENTITY_ID,SCORE_DATE) AS RowRank
FROM #YourTable
) dt
WHERE dt.RowRank=1
) dt_f ON y.ID=dt_f.ID
DELETE #YourTable
WHERE ID NOT IN (SELECT ID FROM #YourTable2)
SELECT 'AFTER ',* FROM #YourTable ORDER BY ENTITY_ID,SCORE_DATE
OUTPUT:
ID ENTITY_ID SCORE_DATE SCORE SOME_OTHER_DATA
------ ----------- --------- ----------------------- ----------- ----------------------------------------------------------------------------------------------------
BEFORE 1 A00000001 2010-01-01 00:00:00.000 100 some data
BEFORE 9 A00000001 2010-01-02 00:00:00.000 104 NULL
BEFORE 5 A00000001 2010-02-01 00:00:00.000 104 NULL
BEFORE 2 A00000002 2010-01-01 00:00:00.000 105 more data
BEFORE 10 A00000002 2010-01-02 00:00:00.000 119 NULL
BEFORE 6 A00000002 2010-02-01 00:00:00.000 119 NULL
BEFORE 3 A00000003 2010-01-01 00:00:00.000 104 various text
BEFORE 11 A00000003 2010-01-01 00:00:00.000 119 NULL
BEFORE 7 A00000003 2010-02-01 00:00:00.000 119 NULL
BEFORE 4 A00009999 2010-01-01 00:00:00.000 101 NULL
BEFORE 8 A00009999 2010-02-01 00:00:00.000 101 arbitrary data
(11 row(s) affected)
(8 row(s) affected)
(3 row(s) affected)
ID ENTITY_ID SCORE_DATE SCORE SOME_OTHER_DATA
------ ----------- --------- ----------------------- ----------- ----------------------------------------------------------------------------------------------------
AFTER 1 A00000001 2010-01-01 00:00:00.000 102 some data
AFTER 5 A00000001 2010-02-01 00:00:00.000 104 NULL
AFTER 2 A00000002 2010-01-01 00:00:00.000 112 more data
AFTER 6 A00000002 2010-02-01 00:00:00.000 119 NULL
AFTER 3 A00000003 2010-01-01 00:00:00.000 111 various text
AFTER 7 A00000003 2010-02-01 00:00:00.000 119 NULL
AFTER 4 A00009999 2010-01-01 00:00:00.000 101 NULL
AFTER 8 A00009999 2010-02-01 00:00:00.000 101 arbitrary data
(8 row(s) affected)