break down by weeks in SQL Server - sql

Given this query:
DECLARE
#FROM_DT DATETIME,
#TO_DT DATETIME
BEGIN
SET #FROM_DT = '10/01/2009'
SET #TO_DT = DATEADD(DAY,7,#FROM_DT)
--WHILE (#FROM_DT <= '10/01/2010')
WHILE (#TO_DT < '10/01/2010')
BEGIN
SELECT
CONVERT(CHAR(10),#FROM_DT,101) AS FROM_DT,
CONVERT(CHAR(10),DATEADD(DAY,-1,#TO_DT),101) AS TO_DT,
COUNTRY AS CITZ,
COUNT(SUBJECT_KEY) AS PEOPLE
FROM MYTALE
WHERE DATE_DT >=#FROM_DT
AND DATE_DT <#TO_DT
GROUP BY COUNTRY
SET #FROM_DT = DATEADD(DAY,7,#FROM_DT)
SET #TO_DT = DATEADD(DAY, 7,#TO_DT)
END
END
Here are my results:
FROM_DT TO_DT COUNTRY PEOPLE
10/01/2009 10/07/2009 A 2
10/01/2009 10/07/2009 B 1
FROM_DT TO_DT COUNTRY PEOPLE
10/08/2009 10/14/2009 A 1
10/08/2009 10/14/2009 C 2
---to
FROM_DT TO_DT COUNTRY PEOPLE
09/23/2010 09/29/2010 A 1
09/23/2010 09/29/2010 B 3
FROM_DT TO_DT COUNTRY PEOPLE
09/30/2010 10/06/2010 C 13
09/30/2010 10/06/2010 D 1
Question:
Is there a way in SQL that it can write the output like below? (I need to consolidate the data. I could copy and paste them but it's 52 weeks of data. Not a efficient way to do it) Please help. I use SQL Server 2005 & 2008 version.
FROM_DT TO_DT COUNTRY PEOPLE
10/01/2009 10/07/2009 A 2
10/01/2009 10/07/2009 B 1
10/08/2009 10/14/2009 A 1
10/08/2009 10/14/2009 C 2
09/23/2010 09/29/2010 A 1
09/23/2010 09/29/2010 B 3
09/30/2010 10/06/2010 C 13
----
From the query above, i commented the WHILE (#FROM_DT <= '10/01/2010') out and replaced it with WHILE (#TO_DT < '10/01/2010') because I would like to get the data for FY10 only, which the date starts from 10/1/2009 to 9/30/2010. However, the result only up to 9/29/2010, the data from 9/30/2010 is not included. Is something wrong with my query? Please help!

Well, SQL Server has a function called DATEPART which can also give you the WEEK part of a date - something like:
SELECT
DATEPART(WEEK, DATE_DT)
Country AS CITZ,
COUNT(Subject_Key) AS PEOPLE
FROM dbo.MyTable
GROUP BY
Country, DATEPART(WEEK, DATE_DT)
This gives you the numeric week number (but not yet the from_date and to_date).
Or you could leave your basic query alone, but store the results into a temporary table:
CREATE TABLE dbo.tmp_Results(FromDT CHAR(10), ToDT CHAR(10),
Country VARCHAR(100), Count INT)
and then just insert your results into that table for each run:
INSERT INTO dbo.tmp_Results(FromDT, ToDT, Country, Count)
SELECT
CONVERT(CHAR(10),#FROM_DT,101) AS FROM_DT,
CONVERT(CHAR(10),DATEADD(DAY,-1,#TO_DT),101) AS TO_DT,
COUNTRY AS CITZ,
COUNT(SUBJECT_KEY) AS PEOPLE
FROM MYTALE
WHERE DATE_DT >=#FROM_DT
AND DATE_DT <#TO_DT
GROUP BY COUNTRY
and then select from that temp table in the end:
SELECT * FROM dbo.tmp_Results

Recursive CTEs to the rescue! No need for temporary tables anymore since you can generate your set on the fly, and can start the weeks at any date instead of on Monday.
(Danger: Written in notepad. Minor bugs / typos may be present. Right idea, though)
WITH weeks (start, end) AS (
select
#from_dt as start
dateadd(day, 7, #from_dt) as end
UNION
select
dateadd(day, start, 7)
dateadd(day, end, 7)
from
weeks
where
start < #last_dt
)
select
w.start,
w.end,
c.country,
count(c.subject_key)
from
my_table c
join weeks on c.date_dt >= start and c.date_dt < end
group by
start, end, country

You could use a dynamic query with a union clause, but what I would do is create a temporary table and insert your results into it. Then you could select out the data from there and drop the temp table.
Your other option would be to create a table that held on to the from-to dates for your weeks and join on that table instead. This would actually be a preferred way to do it, but you would need to keep that table up to date with all of the dates you need.

An approach from data warehousing would be to create a "Weeks" table with all your possible weeks in it, along with their start and end dates:
Week StartDate EndDate
1 10/01/2009 10/07/2009
2 10/08/2009 10/14/2009
3 10/15/2009 10/21/2009
...
...and then just join to that. You fill the "Weeks" table once, in advance -- you can fill it up to the year 3000 if you want -- and then it's available in your database to do queries like this:
SELECT
StartDate, EndDate, COUNTRY, COUNT(SUBJECT_KEY) AS People
FROM
MYTALE INNER JOIN Weeks ON DATE_DT BETWEEN StartDate AND EndDate
GROUP BY
StartDate, EndDate, Country
This often simplifies complicated queries when you need to do data analysis over a range of dates (and you can pre-build a similar "days" or "months" table.) It can also be faster, assuming you've indexed the tables appropriately. These tables are "time dimensions" in star schema data warehouse parlance.

Try:
DECLARE
#FROM_DT DATETIME,
#TO_DT DATETIME
BEGIN
SET #FROM_DT = '10/01/2009'
SET #TO_DT = DATEADD(DAY,7*53,#FROM_DT)
SELECT
CONVERT(CHAR(10),DATEADD(DAY,7*(WEEKNO),#FROM_DT),101) AS FROM_DT,
CONVERT(CHAR(10),DATEADD(DAY,7*(WEEKNO)+6,#FROM_DT),101) AS TO_DT,
COUNTRY AS CITZ,
COUNT(SUBJECT_KEY) AS PEOPLE
(SELECT M.*, TRUNC(DATEDIFF(DAY,#FROM_DT,DATE_DT)/7) WEEKNO
FROM MYTALE M
WHERE DATE_DT >=#FROM_DT
AND DATE_DT <#TO_DT) SQ
GROUP BY COUNTRY, WEEKNO
END

Related

SQL: Efficient distinct count by category over moving date window

The problem
I have a large amount of card transaction data in SQL tables (SQL Server). Each row has the following information:
Main table
TxnDate
UserID
SpendCategory
Amount
AgeGroup
01/01/2018
ID1
Category1
100
18-29
02/03/2019
ID2
Category2
20
40-49
05/03/2019
ID3
Category1
200
30-39
08/03/2019
ID1
Category2
300
18-29
10/03/2019
ID2
Category2
300
40-49
What I need is to find the number of unique users who have had transactions in a particular SpendCategory, or in particular SpendCategory AgeGroup combination, over a moving window of 30 days (e.g. 1st Jan - 30th Jan, 2nd Jan - 31st Jan). So my output should be a table like:
TxnDate
SpendCategory
UniqueUsers
01/01/2018
Category1
800
01/01/2018
Category2
200
02/01/2018
Category1
600
02/01/2018
Category2
300
Or:
TxnDate
SpendCategory
AgeGroup
UniqueUsers
01/01/2018
Category1
18-29
800
01/01/2018
Category2
30-39
200
etc
What I've already tried
I have a solution that works, however it is too slow and inefficient to run on larger datasets (e.g. hundreds of millions of rows).
I first create a table with all the date and SpendCategory/Age combinations that are needed, #all_rows:
TxnDate
SpendingCategory
01/01/2019
Category1
01/01/2019
Category 2
02/01/2019
Category1
02/01/2019
Category 2
Which I can then use to run the following query:
--Create example table
CREATE TABLE #main (TxnDate date, UserID varchar(100), SpendCategory varchar(100));
INSERT INTO #main
VALUES ('01/01/2019', 'ID1', 'Category1'),
('01/01/2019', 'ID2', 'Category1'),
('02/06/2019', 'ID1', 'Category2'),
('02/06/2019', 'ID2', 'Category2')
--Create lookup table
CREATE TABLE #category_lookup (SpendCategory varchar(100))
INSERT INTO #category_lookup
VALUES ('Category1'), ('Category2')
--Create #all_rows
DECLARE #max_date date, #min_date date
SELECT #max_date = MAX(TxnDate) FROM #Main
SELECT #min_date = MIN(TxnDate) FROM #Main;
WITH ListDates(TxnDate) as (
SELECT #min_date AS DATE
UNION ALL
SELECT DATEADD(DAY, 1, TxnDate) AS DATE
FROM ListDates
WHERE TxnDate < #max_date)
SELECT DISTINCT T1.TxnDate, T2.SpendCategory
INTO #all_rows
FROM ListDates AS T1
CROSS JOIN (
SELECT DISTINCT SpendCategory
FROM #category_lookup) AS T2
--Find unique users
SELECT t.TxnDate, t.SpendCategory,
(SELECT COUNT(DISTINCT UserID) AS UniqueUsers
FROM #main
WHERE TxnDate > DATEADD(DAY, -30, t.TxnDate)
AND TxnDate <= t.TxnDate
AND SpendCategory = t.SpendCategory
GROUP BY SpendCategory) AS UniqueUsers
FROM #all_rows as t
Fiddle link
This returns the correct result but is far too slow. Does anyone have an alternative approach that would be more efficient please?
Edit: Adding information as requested in the comments. Unfortunately, I work for a highly restrictive organisation so I do not have access to the query execution plan. I have added more details to the code example above to make it reproducible. Main is generally either a CTE or a temp table containing a subset of the full data available in a permanent table.
We have indexes set up on the Main table, they're non-clustered. The most relevant index to this query is a composite index on TxnDate, SpendCategory and UserID. The code takes at least a day to run on a sample of ~400 million rows, we'd like it to be as fast as possible.
Depending on your data distribution this might be faster:
SELECT t.TxnDate, t.SpendingCategory, tmp.UniqueUsers
FROM #all_rows as t
cross apply (
SELECT COUNT(DISTINCT UserID) AS UniqueUsers
FROM Main m
WHERE m.TxnDate > DATEADD(DAY, -30, t.TxnDate)
AND m.TxnDate <= t.TxnDate
AND m.SpendingCategory = t.SpendingCategory
) AS tmp(UniqueUsers);
EDIT: And calling a function on every row is not feasible, it is better to have that beforehand:
select DateAdd(day, -30, txnDate) as FromDate,
txnDate as ToDate, SpendingCategory
into #AllRows
from #all_Rows;
SELECT t.TxnDate, t.SpendingCategory, tmp.UniqueUsers
FROM #allrows as t
cross apply (
SELECT COUNT(DISTINCT UserID) AS UniqueUsers
FROM Main m
WHERE m.TxnDate > T.fromDate
AND m.TxnDate <= t.ToDate
AND m.SpendingCategory = t.SpendingCategory
) AS tmp(UniqueUsers);
The problem is that each row could be scanned 30 times.
I would use a helper table to accumulate distinct values per each day and then scan that smaller table something like this:
SELECT m.TxnDate, m.SpendingCategory, AgeGroup, COUNT(DISTINCT m.UserId) UniqueUsers
INTO #DailyCounts
FROM Main m
GROUP BY m.TxnDate, m.SpendingCategory, m.AgeGroup
CREATE CLUSTERED INDEX tmpDalyCount on #DailyCounts(TxnDate, SpendingCategory, AgeGroup)
SELECT t.TxnDate, t.SpendingCategory, COUNT(DISTINCT dc.UniqueUsers) UniqueUsers
FROM #All_Rows t
INNER JOIN #DailyCounts dc
ON dc.TxnDate > DATEADD(DAY, -30, t.TxnDate)
AND dc.TxnDate <= t.TxnDate
GROUP BY t.TxnDate, t.SpendingCategory
The same table will help to create both outputs
Here is my suggested approach. This follows the same approach as in Cetin Basoz's earlier answer, where user statistics are summarized and indexed ahead of the final query.
-- First summarize distinct UserIDs, Age groups, and SpendingCategory by date
SELECT
DISTINCT CAST(TxnDate AS DATE) AS TxnDate,
SpendingCategory, AgeGroup, UserId
INTO #DailyUsers
FROM Main
CREATE INDEX IX_tmpDailyUsers
ON #DailyUsers(TxnDate, SpendingCategory, AgeGroup) INCLUDE(UserId)
-- Determine needed date range
DECLARE #MinDate DATE, #MaxDate DATE
SELECT #MinDate = MIN(TxnDate), #MaxDate = MAX(TxnDate)
FROM #DailyUsers
-- For each date, summarize the last 30 days worth of user activity
;WITH Dates AS (
SELECT #MinDate AS Date
UNION ALL
SELECT DATEADD(day, 1, D.DATE)
FROM Dates D
WHERE D.Date < #MaxDate
)
SELECT
D.EndDate, U.SpendingCategory, U.AgeGroup,
COUNT(DISTINCT m.UserId) AS UniqueUsers
INTO #ThirtyDayCounts
FROM Dates D
JOIN #DailyUsers U
ON U.TxnDate > DATEADD(day, -30, D.EndDate)
AND U.TxnDate <= D.EndDate
GROUP BY D.EndDate, U.SpendingCategory, U.AgeGroup
CREATE INDEX IX_tmpThirtyDayCounts
ON #ThirtyDayCounts(EndDate, SpendingCategory)
-- Now pull it together with what should be a simple efficient join
SELECT t.TxnDate, t.SpendingCategory, tdc.AgeGroup, tdc.UniqueUsers
FROM #All_Rows t
JOIN #ThirtyDayCounts tdc
ON tdc.SpendingCategory = t.SpendingCategory
AND tdc.EndDate = CAST(t.TxnDate AS DATE)
(Note: The above is untested. If you spot errors, please comment and I will correct my post.)

Optimizing a SQL Query when joining two tables. Naive algorithm gives me millions of rows

I apologize, I am not sure how to word the heading for this question. If someone can reform it for me to better suit what I am asking that would be greatly appreciated.
I have a quite a problem that I have been stuck on for the longest of time. I use Tableau in conjunction with SQLServer 2014.
I have a single table that essentially shows all the employees within our company. Their hire date and termination date (NULL if still employed). I am looking to generate a headcount forthe past. Here is an example of this table:
employeeID HireDate TermDate FavouriteFish FavouriteColor
1 1/1/15 1/1/18 Cod Blue
2 4/12/16 NULL Bass Red
.
.
.
n
As you can see this list can go on and on.. In fact the table in question I currently have over 10000 rows for all past and current employees.
My goal is to construct a view to see on each day of the year for the last 5 years what the total head count of employed employees we had. Here is the kicker though... I need to retain the rest of the information such as:
FavouriteFish FavouriteColor... and so on
The only way I can think of doing this,and it doesn't work so well because it is extremely slow, is to create a separate calendar table for each day of the year for the past 5 years; like so:
Date CrossJoinKey
1/1/2013 1
1/2/2013 1
1/3/2013 1
.
.
.
4/4/2018 1
From here I add a column to my original Employee Table called: CrossJoinKey; like so..
employeeID HireDate TermDate FavouriteFish FavouriteColor CrossJoinKey
1 1/1/15 1/1/18 Cod Blue 1
2 4/12/16 NULL Bass Red 1
.
.
.
n
From here I create a LEFT JOIN Calendar ON Employee.CrossKeyJoin=Calendar.CrossKeyJoin
Hopefully here you can immediately see the problem.. It creates a relationship with A LOT OF ROWS!! In fact it gives me somewhere around 18million rows. It gives me the information I am after, however it takes a LONG time to query, and when I import this to Tableau to create an extract it takes a LONG time to do that as well.. However, once Tableau eventually creates the extract it is relatively fast. I can use the inner guts to isolate and creating a headcount by day for the past 5 years... by seeing if the date field is in between the termDate and HireDate. But this entire process needs to be quite frequently, and I feel the current method is unpractical.
I feel this is a naive way to accomplish what I am after, and I feel this problem has to have addressed before in the past. Is there anyone here that could please shed some light on how to optimize this?
Word of note... I have considered essentially creating a query that populates a calendar table by looking through the employee table and 'counting' each employee that is still employed, but this method loses resolution and I am not able to retain any of the other data for the employees.
Something like this, shown below, works and is much faster, but NOT what I am looking for:
Date HeadCount
1/1/2013 1200
1/2/2013 1201
1/3/2013 1200
.
.
.
4/4/2018 5000
Thank you very much for spending some time on this.
UPDATE:
Here is a link to a google sheets data sample
I've edited some of your data, as you can see in the #example table.
I wanted to note: you either spelt =D favourite or Color incorrectly, Please correct it, either FavoriteColor, or FavouriteColour.
declare #example as table (
exampleid int identity(1,1) not null primary key clustered
, StartDate date not null
, TermDate date null
);
insert into #example (StartDate, TermDate)
select '1/1/2016', '1/1/2018' union all
select '4/3/2017', '1/10/2018' union all
select '9/3/2016', '2/4/2018' union all
select '5/9/2017', '11/21/2017' union all
select '9/18/2016', '11/15/2017' union all
select '12/12/2015', '2/8/2018' union all
select '6/18/2016', '12/20/2017' union all
select '7/26/2015', '11/4/2017' union all
select '1/7/2015', NULL union all
select '10/2/2013', '10/21/2013' union all
select '10/14/2013', '12/12/2017' union all
select '10/11/2013', '11/3/2017' union all
select '6/30/2015', '1/12/2018' union all
select '2/17/2016', NULL union all
select '8/12/2015', '11/26/2017' union all
select '12/2/2015', '11/15/2017' union all
select '3/30/2016', '11/30/2017' union all
select '6/18/2016', '11/9/2017' union all
select '4/3/2017', '2/12/2018' union all
select '3/26/2017', '1/15/2018' union all
select '1/27/2017', NULL union all
select '7/29/2016', '1/10/2018';
--| This is an adaption of Aaron Bertrand's work (time dim table)
--| this will control the start date
declare #date datetime = '2013-10-01';
;with cte as (
select 1 ID
, #date date_
union all
select ID + 1
, dateadd(day, 1, date_)
from cte
)
, cte2 as (
select top 1000 ID
, cast(date_ as date) date_
, 0 Running
, iif(datepart(weekday, date_) in(1,7), 0,1) isWeekday
, datepart(weekday, date_) DayOfWeek
, datename(weekday, date_) DayOfWeekName
, month(date_) Month
, datename(month, date_) MonthName
, datepart(quarter, date_) Quarter
from cte
--option (maxrecursion 1000)
)
, cte3 as (
select a.id
, Date_
, b.StartDate
, iif(b.StartDate is not null, 1, 0) Add_
, iif(c.TermDate is not null, -1, 0) Remove_
from cte2 a
left join #example b
on a.date_ = b.StartDate
left join #example c
on a.date_ = c.TermDate
-- option (maxrecursion 1000)
)
select date_
--, Add_
--, Remove_
, sum((add_ + remove_)) over (order by date_ rows unbounded preceding) CurrentCount
from cte3
option (maxrecursion 1000)
Result Set:
date_ CurrentCount
2013-10-01 0
2013-10-02 1
2013-10-03 1
2013-10-04 1
2013-10-05 1

how to split single row into multiple row in db2?

This is what i have in table xyz
NAME AMOUNT BEGIN_DATE END_DATE
ABC 5.0 2013-05-11 2014-06-20
following is what i want using IBM DB2 database
NAME AMOUNT BEGIN_DATE END_DATE
ABC 5.0 2013-05-11 2013-12-31
ABC 5.0 2014-01-01 2014-06-30
instead of just one row from xyz table, i need to fetch 2 rows as above output.
How do I split one row into two ?
The following will only list rows where the begin and end dates span exactly two years or within the same year.
SELECT
NAME,
AMOUNT,
BEGIN_DATE,
DATE(YEAR(BEGIN_DATE)||'-12-31') AS END_DATE
FROM xyz
WHERE YEAR(END_DATE)-YEAR(BEGIN_DATE)=1
UNION
SELECT
NAME,
AMOUNT,
DATE(YEAR(END_DATE)||'-01-01') AS BEGIN_DATE,
END_DATE
FROM xyz
WHERE YEAR(END_DATE)-YEAR(BEGIN_DATE)=1
UNION
SELECT
NAME,
AMOUNT,
BEGIN_DATE,
END_DATE
FROM xyz
WHERE YEAR(END_DATE)-YEAR(BEGIN_DATE)=0
ORDER BY BEGIN_DATE
You can make two SQL statements, to select the first, using '2013-12-31' as a constant for the end-date, then to select a second time, using '2014-01-01' as a constant start date. Then use UNION ALL to put them together.
If you also have some records that start and end within 2013, and therefore do not need to be split, you can get those separately, and exclude them from the other two queries. Other variations in your data might require some extra conditions, but this example should get you going:
select NAME, AMOUNT, BEGIN_DATE, END_DATE
from xyz
where END_DATE <= '2013-12-31'
UNION ALL
select NAME, AMOUNT, BEGIN_DATE, '2013-12-31'
from xyz
where END_DATE >= '2014-01-01'
UNION ALL
select NAME, AMOUNT, '2014-01-01', END_DATE
from xyz
where END_DATE >= '2014-01-01'

Loading Date Range Values to a Daily Grain Fact Table

ETL Question here.
For a given table that includes entries with a start and end date, what is the optimal method to retrieve counts for each day, including those days that may not have an entry within the scope of the start end date.
Given Table Example
Stock
ID StartDate EndDate Category
1 1/1/2013 1/5/2013 Appliances
2 1/1/2013 1/10/2013 Appliances
3 1/2/2013 1/10/2013 Appliances
Output required
Available
Category EventDate Count
Appliances 1/1/2013 2
Appliances 1/2/2013 3
...
...
Appliances 1/10/2013 2
Appliances 1/11/2013 0
...
...
One method I know of, which takes FOREVER, is to create a Table variable, and run a While Block iterating through the start and end of the range I wish to retrieve, then execute a query like so..
Insert into #TempTable (Category,EventDate,Count)
FROM Stock
Where #CurrentLoopDate BETWEEN StartDate AND EndDate
Another method would be to create a table or temp table of dates in the range I want populated, and join it with a BETWEEN function.
Insert into #TempTable (Category,EventDate,Count)
FROM DateTable
INNER JOIN Stock ON DateTable.[Date] BETWEEN StartDate AND EndDate
Yet other methods are similar but use SSIS, but essentially are the same as the above two solutions.
Any GURU's know of a more efficient method?
Have you tried using a recursive CTE?
WITH Dates_CTE AS (
SELECT [ID]
,[StartDate]
,[EndDate]
,[Category]
FROM [dbo].[Stock]
UNION ALL
SELECT [ID]
,DATEADD(D, 1, [StartDate])
,[EndDate]
,[Category]
FROM Dates_cte d
WHERE DATEADD(D, 1, [StartDate]) <= EndDate
)
SELECT StartDate AS EventDate
,Category
,COUNT(*)
FROM Dates_CTE
GROUP BY StartDate, Category
OPTION (MAXRECURSION 0)
That should do the trick ;-)

List Transactions that Meet Criteria

Realizing that another question I asked before may be too difficult, I'm changing my requirements.
I work for a credit card company. Our database has a customer table and a transaction table. Fields in the customer table are SSN and CustomerKey. Fields in the transaction table are CustomerKey, transaction date (Transdate), and transaction amount (TransAmt).
I need a query that can identify each ssn where the sum of any of their transaction amounts > 1000 within a two day period in 2012. If a ssn has transaction amounts > 1000 within a two day period, I need the query to return all the transactions for that ssn.
Here is an example of the raw data in the Transaction Table:
Trans#-----CustKey-----Date--------Amount
1-----------12345----01/01/12--------$600
2-----------12345----01/02/12--------$500
3-----------67890----01/03/12--------$10
4-----------98765----04/01/12--------$600
5-----------43210----04/02/12--------$600
6-----------43210----04/03/12--------$100
7-----------13579----04/02/12--------$600
8-----------24568----04/03/12--------$100
Here is an example of the raw data in the Customer Table:
CustKey-----SSN
12345------123456789
67890------123456789
98765------987654321
43210------987654321
13579------246801357
24568------246801357
Here are the results I need:
Trans#------SSN---------Date---------Amount
1--------123456789----01/01/12---------$600
2--------123456789----01/02/12---------$500
3--------123456789----01/03/12----------$10
4--------987654321----04/01/12---------$600
5--------987654321----04/02/12---------$600
6--------987654321----04/03/12---------$100
As you can see in my results included all transactions for SSN 123456789 and 987654321, and excluded SSN 246801357.
One way of doing this is to roll through each two day period within a year. Here is an SQL Fiddle example.
The idea is pretty simple:
1) Create a temp table to store all matching customers
create table CustomersToShow
(
SSN int
)
2) Loop trough a year and populate temp table with customers that match the amount criteria
declare #firstDayOfTheYear datetime = '1/1/2012';
declare #lastDayOfTheYear datetime = '12/31/2012';
declare #currentDate datetime = #firstDayOfTheYear;
declare #amountThreshold money = 1000;
while #currentDate <= #lastDayOfTheYear
begin
insert into CustomersToShow(SSN)
select b.SSN
from transactions a
join customers b
on a.CustKey = b.CustKey
where TransactionDate >= #currentDate
and TransactionDate <= DATEADD(day, 2, #currentDate)
group by b.SSN
having SUM(a.TransactionAmount) >= #amountThreshold
set #currentDate = DATEADD(day,2,#currentDate)
end
3) And then just select
select a.TransNumber, b.SSN, a.TransactionDate, a.TransactionAmount
from transactions a
join customers b
on a.CustKey = b.CustKey
join CustomersToShow c
on b.SSN = c.SSN
Note: This will be slow...
While you could probably come up with a hacky way to do this via standard SQL, this is a problem that IMO is more suited to being solved by code (i.e. not by set-based logic / SQL).
It would be easy to solve if you sort the transaction list by customerKey and date, then loop through the data. Ideally I would do this in code, but alternatively you could write a stored procedure and use a loop and a cursor.
This is easy and well-suited to set-based logic if you look at it right. You simply need to join to a table that has every date range you're interested in. Every T-SQL database (Oracle has it built-in) should have a utility table named integers - it's very useful surprisingly often:
CREATE TABLE integers ( n smallint, constraint PK_integers primary key clustered (n))
INSERT integers select top 1000 row_number() over (order by o.id) from sysobjects o cross join sysobjects
Your date table then looks like:
SELECT dateadd(day, n-1, '2012') AS dtFrom, dateadd(day, n+1, '2012') AS dtTo
from integers where n <= 366
You can then (abbreviating):
SELECT ssn, dtFrom
FROM yourTables t
JOIN ( SELECT dateadd(day, n-1, '2012') as dtFrom, dateadd(day, n+1, '2012') AS dtTo
from integers where n <= 366 ) d on t.date between d.dtFrom and d.dtTo
GROUP BY ssn, dtFrom
HAVING sum(amount) > 1000
You can select all your transactions:
WHERE ssn in ( SELECT distinct ssn from ( <above query> ) t )