How make Excel arbitrary pivoting in MS SQL 2017 (cross join + loops) - sql

Would you help me please, to solve the task below in SQL (MS SQL Server 2017). It is simple in Excel, but seems very complicated in SQL.
There is a table with clients and their activities split by days:
client 1may 2may 3may 4may 5may other days
client1 0 0 0 0 0 ...
client2 0 0 0 0 0 ...
client3 0 0 0 0 0 ...
client4 1 1 1 1 1 ...
client5 1 1 1 0 0 ...
It is necessary to create the same table (the same quantity of rows and columns), but turn the values into new one according to the rule:
Current day value =
A) If all everyday values during a week before the day, including the current one = 1, then 1
B) If all everyday values during a week before the day, including the current one = 0, then 0
C) If the values are different, then we leave the status of the previous day (if the status of the previous day is not known, for example, the Client is new, then 0)
In Excel, I do this using the formula: = IF (AND (AF2 = AE2; AE2 = AD2; AD2 = AC2; AC2 = AB2; AB2 = AA2; AA2 = Z2); current_day_value; IF (previous_day_value = ""; 0; previous_day_value )).
The example with excel file is attached.
Thank you very much.

First thing, it's NEVER a good idea to have dates as columns.
So step #1 transpose your columns to rows. In other world to build a table with three columns
```
client date Value
client1 May1 0
client1 May2 0
client1 May3 0
.... ... ..
client4 May1 1
client4 May2 1
client4 May3 1
.... ... ..
```
step #2 perform all the calculations you need by using the date field.

Basically you put always the status of the previous day, in any case (except null).
So, i would do something like this (oracle syntax, working in sql server too), supposing the first columns is 1may
Insert into newTable (client, 1may,2may,....) select (client, 0, coalesce(1may,0), coalesce (2may,0), .... from oldTable;
Anyway me too i believe is not a good practice to put the days as columns of a relational table.

You're going to struggle with this because most brands of SQL don't allow "arbitrary pivoting", that is, you need to specify the columns you want to be displayed on a pivot - Whereas Excel will just do this for you. SQL can do this but it required dynamic SQL which can get pretty complicated and annoying pretty fast.
I would suggest you use sql just to construct the data, and then excel or SSRS (As you're in TSQL) to actually do the visualization.
Anyway. I think this does what you want:
WITH Data AS (
SELECT * FROM (VALUES
('Client 1',CONVERT(DATE, '2020-05-04'),1)
, ('Client 1',CONVERT(DATE, '2020-05-05'),1)
, ('Client 1',CONVERT(DATE, '2020-05-06'),1)
, ('Client 1',CONVERT(DATE, '2020-05-07'),0)
, ('Client 1',CONVERT(DATE, '2020-05-08'),0)
, ('Client 1',CONVERT(DATE, '2020-05-09'),0)
, ('Client 1',CONVERT(DATE, '2020-05-10'),1)
, ('Client 1',CONVERT(DATE, '2020-05-11'),1)
, ('Client 1',CONVERT(DATE, '2020-05-12'),1)
, ('Client 2',CONVERT(DATE, '2020-05-04'),1)
, ('Client 2',CONVERT(DATE, '2020-05-05'),0)
, ('Client 2',CONVERT(DATE, '2020-05-06'),0)
, ('Client 2',CONVERT(DATE, '2020-05-07'),1)
, ('Client 2',CONVERT(DATE, '2020-05-08'),0)
, ('Client 2',CONVERT(DATE, '2020-05-09'),1)
, ('Client 2',CONVERT(DATE, '2020-05-10'),0)
, ('Client 2',CONVERT(DATE, '2020-05-11'),1)
) x (Client, RowDate, Value)
)
SELECT
Client
, RowDate
, Value
, CASE
WHEN OnesBefore = DaysInWeek THEN 1
WHEN ZerosBefore = DaysInWeek THEN 0
ELSE PreviousDayValue
END As FinalCalculation
FROM (
-- This set uses windowing to calculate the intermediate values
SELECT
*
-- The count of the days present in the data, as part of the week may be missing we can't assume 7
-- We only count up to this day, so its in line with the other parts of the calculation
, COUNT(RowDate) OVER (PARTITION BY Client, WeekCommencing ORDER BY RowDate) AS DaysInWeek
-- Count up the 1's for this client and week, in date order, up to (and including) this date
, COUNT(IIF(Value = 1, 1, NULL)) OVER (PARTITION BY Client, WeekCommencing ORDER BY RowDate) AS OnesBefore
-- Count up the 0's for this client and week, in date order, up to (and including) this date
, COUNT(IIF(Value = 0, 1, NULL)) OVER (PARTITION BY Client, WeekCommencing ORDER BY RowDate) AS ZerosBefore
-- get the previous days value, or 0 if there isnt one
, COALESCE(LAG(Value) OVER (PARTITION BY Client, WeekCommencing ORDER BY RowDate), 0) AS PreviousDayValue
FROM (
-- This set adds a few simple values in that we can leverage later
SELECT
*
, DATEADD(DAY, -DATEPART(DW, RowDate) + 1, RowDate) As WeekCommencing
FROM Data
) AS DataWithExtras
) AS DataWithCalculations
As you haven't specified your table layout, I don't know what table and field names to use in my example. Hopefully if this is correct you can figure out how to click it in place with what you have - If not, leave a comment
I will note as well, I've made this purposely verbose. If you don't know what the "OVER" clause is, you'll need to do some reading: https://www.sqlshack.com/use-window-functions-sql-server/. The gist is they do aggregations without actually crunching the rows together.
Edit: Adjusted the calculation to be able to account for an arbitrary number of days in the week

Thank you so much to everyone, especially to David and Massimo, which prompted me to restructure the data.
--we join clients and dates each with each and label clients with 'active' or 'inactive'
with a as (
select client, dates
from (select distinct client from dbo.clients) a
cross join (select dates from dates) b
)
, b as (
select date
,1 end active
,client
from clients a
join dbo.dates b on a.id = b.id
)
select client
,a.dates
,isnull(b.active, 0) active
into #tmp2
from a
left join b on a.client= b.client and a.dates = b.dates
--declare variables - for date start and for loop
declare #min_date date = (select min(dates) from #tmp2);
declare #n int = 1
declare #row int = (select count(distinct dates) from #tmp2) --number of the loop iterations
--delete data from the final results
delete from final_results
--fill the table with final results
--run the loop (each iteration = analyse of each 1-week range)
while #n<=#row
begin
with a as (
--run the loop
select client
,max(dates) dates
,sum (case when active = 1 then 1 else null end) sum_active
,sum (case when active = 0 then 1 else null end) sum_inactive
from #tmp2
where dates between dateadd(day, -7 + #n, #min_date) and dateadd(day, -1 + #n, #min_date)
group by client
)
INSERT INTO [dbo].[final_results]
(client
,[dates]
,[final_result])
select client
,dates
,case when sum_active = 7 then 1 --rule A
when sum_inactive = 7 then 0 -- rule B
else
(case when isnull(sum_active, 0) + isnull(sum_inactive, 0) < 7 then 0
else
(select final_result
from final_results b
where b.dates = dateadd(day, -1, a.dates)
and a.client= b.client) end
) end
from a
set #n=#n+1
end
if object_id(N'tempdb..#tmp2', 'U') is not null drop table #tmp2

Related

Splitting out a cost dynamically across weeks

I’m creating an interim table in SQL Server for use with PowerBI to query financial data.
I have a finance transactions table tblfinance with
CREATE TABLE TBLFinance
(ID int,
Value float,
EntryDate date,
ClientName varchar (250)
)
INSERT INTO TBLFinance(ID ,Value ,EntryDate ,ClientName)
VALUES(1,'1783.26','2018-10-31 00:00:00.000','Alpha')
, (2,'675.3','2018-11-30 00:00:00.000','Alpha')
, (3,'243.6','2018-12-31 00:00:00.000','Alpha')
, (4,'8.17','2019-01-31 00:00:00.000','Alpha')
, (5,'257.23','2019-01-31 00:00:00.000','Alpha')
, (6,'28','2019-02-28 00:00:00.000','Alpha')
, (7,'1470.61','2019-03-31 00:00:00.000','Bravo')
, (8,'1062.86','2019-04-30 00:00:00.000','Bravo')
, (9,'886.65','2019-05-31 00:00:00.000','Bravo')
, (10,'153.31','2019-05-31 00:00:00.000','Bravo')
, (11,'150.24','2019-06-30 00:00:00.000','Bravo')
, (12,'690.14','2019-07-31 00:00:00.000','Charlie')
, (13,'21.67','2019-08-31 00:00:00.000','Charlie')
, (14,'339.29','2018-10-31 00:00:00.000','Charlie')
, (15,'807.96','2018-11-30 00:00:00.000','Delta')
, (16,'48.94','2018-12-31 00:00:00.000','Delta')
I’m calculating transaction values that fall within a week. My week ends on a Sunday, so I have the following query:
INSERT INTO tblAnalysis
(WeekTotal
, WeekEnd
, Client
)
SELECT SUM (VALUE) AS WeekTotal
, dateadd (day, case when datepart (WEEKDAY, EntryDate) = 1 then 0 else 8 - datepart (WEEKDAY, EntryDate) end, EntryDate) AS WeekEnd
, ClientName as Client
FROM dbo.tblFinance
GROUP BY dateadd (day, case when datepart (WEEKDAY, EntryDate) = 1 then 0 else 8 - datepart (WEEKDAY, EntryDate) end, EntryDate), CLIENTNAME
I’ve now been informed that some of the costs incurred within a given week maybe monthly, and therefore need to be split into 4 weeks, or annually, so split into 52 weeks. I will write a case statement to update the costs based on ClientName, so assume there is an additional field called ‘Payfrequency’.
I want to avoid having to pull the values affected into a temp table, and effectively write this – because there’ll be different sums applied depending on frequency.
SELECT *
INTO #MonthlyCosts
FROM
(
SELECT
client
, VALUE / 4 AS VALUE
, WEEKENDING
FROM tblAnalysis
UNION
SELECT
client
, nt_acnt
, VALUE / 4 AS VALUE
, DATEADD(WEEK,1,WEEKENDING) AS WEEKENDING
FROM tblAnalysis
UNION
SELECT
client
, VALUE / 4 AS VALUE
, DATEADD(WEEK,2,WEEKENDING) AS WEEKENDING
FROM tblAnalysis
UNION
SELECT
client
, VALUE / 4 AS VALUE
, DATEADD(WEEK,3,WEEKENDING) AS WEEKENDING
FROM tblAnalysis
) AS A
I know I need a stored procedure to hold variables so the calculations can be carried out dynamically, but have no idea where to start.
You can use recursive CTEs to split the data:
with cte as (
select ID, Value, EntryDate, ClientName, payfrequency, 1 as n
from TBLFinance f
union all
select ID, Value, EntryDate, ClientName, payfrequency, n + 1
from cte
where n < payfrequency
)
select *
from cte;
Note that by default this is limited to 100 recursion steps. You can add option (maxrecursion 0) for unlimited numbers of days.
The best solution would be to make use of a numbers table. If you can create a table on your server with one column holding a sequence of integer numbers.
You can then use it like this for your weekly values:
SELECT
client
, VALUE / 52 AS VALUE
, DATEADD(WEEK,N.Number,WEEKENDING) AS WEEKENDING
FROM tblAnalysis AS A
CROSS JOIN tblNumbers AS N
WHERE NCHAR.Number <=52

Splitting up group by with relevant aggregates beyond the basic ones?

I'm not sure if this has been asked before because I'm having trouble even asking it myself. I think the best way to explain my dilemma is to use an example.
Say I've rated my happiness on a scale of 1-10 every day for 10 years and I have the results in a big table where I have a single date correspond to a single integer value of my happiness rating. I say, though, that I only care about my happiness over 60 day periods on average (this may seem weird but this is a simplified example). So I wrap up this information to a table where I now have a start date field, an end date field, and an average rating field where the start days are every day from the first day to the last over all 10 years, but the end dates are exactly 60 days later. To be clear, these 60 day periods are overlapping (one would share 59 days with the next one, 58 with the next, and so on).
Next I pick a threshold rating, say 5, where I want to categorize everything below it into a "bad" category and everything above into a "good" category. I could easily add another field and use a case structure to give every 60-day range a "good" or "bad" flag.
Then to sum it up, I want to display the total periods of "good" and "bad" from maximum beginning to maximum end date. This is where I'm stuck. I could group by the good/bad category and then just take min(start date) and max(end date), but then if, say, the ranges go from good to bad to good then to bad again, output would show overlapping ranges of good and bad. In the aforementioned situation, I would want to show four different ranges.
I realize this may seem clearer to me that it would to someone else so if you need clarification just ask.
Thank you
---EDIT---
Here's an example of what the before would look like:
StartDate| EndDate| MoodRating
------------+------------+------------
1/1/1991 |3/1/1991 | 7
1/2/1991 |3/2/1991 | 7
1/3/1991 |3/3/1991 | 4
1/4/1991 |3/4/1991 | 4
1/5/1991 |3/5/1991 | 7
1/6/1991 |3/6/1991 | 7
1/7/1991 |3/7/1991 | 4
1/8/1991 |3/8/1991 | 4
1/9/1991 |3/9/1991 | 4
And the after:
MinStart| MaxEnd | Good/Bad
-----------+------------+----------
1/1/1991|3/2/1991 |good
1/3/1991|3/4/1991 |bad
1/5/1991|3/6/1991 |good
1/7/1991|3/9/1991 |bad
Currently my query with the group by rating would show:
MinStart| MaxEnd | Good/Bad
-----------+------------+----------
1/1/1991|3/6/1991 |good
1/3/1991|3/9/1991 |bad
This is something along the lines of
select min(StartDate), max(EndDate), Good_Bad
from sourcetable
group by Good_Bad
While Jason A Long's answer may be correct - I can't read it or figure it out, so I figured I would post my own answer. Assuming that this isn't a process that you're going to be constantly running, the CURSOR's performance hit shouldn't matter. But (at least to me) this solution is very readable and can be easily modified.
In a nutshell - we insert the first record from your source table into our results table. Next, we grab the next record and see if the mood score is the same as the previous record. If it is, we simply update the previous record's end date with the current record's end date (extending the range). If not, we insert a new record. Rinse, repeat. Simple.
Here is your setup and some sample data:
DECLARE #MoodRanges TABLE (StartDate DATE, EndDate DATE, MoodRating int)
INSERT INTO #MoodRanges
VALUES
('1/1/1991','3/1/1991', 7),
('1/2/1991','3/2/1991', 7),
('1/3/1991','3/3/1991', 4),
('1/4/1991','3/4/1991', 4),
('1/5/1991','3/5/1991', 7),
('1/6/1991','3/6/1991', 7),
('1/7/1991','3/7/1991', 4),
('1/8/1991','3/8/1991', 4),
('1/9/1991','3/9/1991', 4)
Next, we can create a table to store our results, as well as some variable placeholders for our cursor:
DECLARE #MoodResults TABLE(ID INT IDENTITY(1, 1), StartDate DATE, EndDate DATE, MoodScore varchar(50))
DECLARE #CurrentStartDate DATE, #CurrentEndDate DATE, #CurrentMoodScore INT,
#PreviousStartDate DATE, #PreviousEndDate DATE, #PreviousMoodScore INT
Now we put all of the sample data into our CURSOR:
DECLARE MoodCursor CURSOR FOR
SELECT StartDate, EndDate, MoodRating
FROM #MoodRanges
OPEN MoodCursor
FETCH NEXT FROM MoodCursor INTO #CurrentStartDate, #CurrentEndDate, #CurrentMoodScore
WHILE ##FETCH_STATUS = 0
BEGIN
IF #PreviousStartDate IS NOT NULL
BEGIN
IF (#PreviousMoodScore >= 5 AND #CurrentMoodScore >= 5)
OR (#PreviousMoodScore < 5 AND #CurrentMoodScore < 5)
BEGIN
UPDATE #MoodResults
SET EndDate = #CurrentEndDate
WHERE ID = (SELECT MAX(ID) FROM #MoodResults)
END
ELSE
BEGIN
INSERT INTO
#MoodResults
VALUES
(#CurrentStartDate, #CurrentEndDate, CASE WHEN #CurrentMoodScore >= 5 THEN 'GOOD' ELSE 'BAD' END)
END
END
ELSE
BEGIN
INSERT INTO
#MoodResults
VALUES
(#CurrentStartDate, #CurrentEndDate, CASE WHEN #CurrentMoodScore >= 5 THEN 'GOOD' ELSE 'BAD' END)
END
SET #PreviousStartDate = #CurrentStartDate
SET #PreviousEndDate = #CurrentEndDate
SET #PreviousMoodScore = #CurrentMoodScore
FETCH NEXT FROM MoodCursor INTO #CurrentStartDate, #CurrentEndDate, #CurrentMoodScore
END
CLOSE MoodCursor
DEALLOCATE MoodCursor
And here are the results:
SELECT * FROM #MoodResults
ID StartDate EndDate MoodScore
----------- ---------- ---------- --------------------------------------------------
1 1991-01-01 1991-03-02 GOOD
2 1991-01-03 1991-03-04 BAD
3 1991-01-05 1991-03-06 GOOD
4 1991-01-07 1991-03-09 BAD
Is this what you're looking for?
IF OBJECT_ID('tempdb..#MyDailyMood', 'U') IS NOT NULL
DROP TABLE #MyDailyMood;
CREATE TABLE #MyDailyMood (
TheDate DATE NOT NULL,
MoodLevel INT NOT NULL
);
WITH
cte_n1 (n) AS (SELECT 1 FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) n (n)),
cte_n2 (n) AS (SELECT 1 FROM cte_n1 a CROSS JOIN cte_n1 b),
cte_n3 (n) AS (SELECT 1 FROM cte_n2 a CROSS JOIN cte_n2 b),
cte_Calendar (dt) AS (
SELECT TOP (DATEDIFF(dd, '2007-01-01', '2017-01-01'))
DATEADD(dd, ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) - 1, '2007-01-01')
FROM
cte_n3 a CROSS JOIN cte_n3 b
)
INSERT #MyDailyMood (TheDate, MoodLevel)
SELECT
c.dt,
ABS(CHECKSUM(NEWID()) % 10) + 1
FROM
cte_Calendar c;
--==========================================================
WITH
cte_AddRN AS (
SELECT
*,
RN = ISNULL(NULLIF(ROW_NUMBER() OVER (ORDER BY mdm.TheDate) % 60, 0), 60)
FROM
#MyDailyMood mdm
),
cte_AssignGroups AS (
SELECT
*,
DateGroup = DENSE_RANK() OVER (PARTITION BY arn.RN ORDER BY arn.TheDate)
FROM
cte_AddRN arn
)
SELECT
BegOfRange = MIN(ag.TheDate),
EndOfRange = MAX(ag.TheDate),
AverageMoodLevel = AVG(ag.MoodLevel),
CASE WHEN AVG(ag.MoodLevel) >= 5 THEN 'Good' ELSE 'Bad' END
FROM
cte_AssignGroups ag
GROUP BY
ag.DateGroup;
Post OP update solution...
WITH
cte_AddRN AS ( -- Add a row number to each row that resets to 1 ever 60 rows.
SELECT
*,
RN = ISNULL(NULLIF(ROW_NUMBER() OVER (ORDER BY mdm.TheDate) % 60, 0), 60)
FROM
#MyDailyMood mdm
),
cte_AssignGroups AS ( -- Use DENSE_RANK to create groups based on the RN added above.
-- How it works: RN set the row number 1 - 60 then repeats itself
-- but we dont want ever 60th row grouped together. We want blocks of 60 consecutive rows grouped together
-- DENSE_RANK accompolishes this by ranking within all the "1's", "2's"... and so on.
-- verify with the following query... SELECT * FROM cte_AssignGroups ag ORDER BY ag.TheDate
SELECT
*,
DateGroup = DENSE_RANK() OVER (PARTITION BY arn.RN ORDER BY arn.TheDate)
FROM
cte_AddRN arn
),
cte_AggRange AS ( -- This is just a straight forward aggregation/rollup. It produces the results similar to the sample data you posed in your edit.
SELECT
BegOfRange = MIN(ag.TheDate),
EndOfRange = MAX(ag.TheDate),
AverageMoodLevel = AVG(ag.MoodLevel),
GorB = CASE WHEN AVG(ag.MoodLevel) >= 5 THEN 'Good' ELSE 'Bad' END,
ag.DateGroup
FROM
cte_AssignGroups ag
GROUP BY
ag.DateGroup
),
cte_CompactGroup AS ( -- This time we're using dense rank to group all of the consecutive "Good" and "Bad" values so that they can be further aggregated below.
SELECT
ar.BegOfRange, ar.EndOfRange, ar.AverageMoodLevel, ar.GorB, ar.DateGroup,
DenseGroup = ar.DateGroup - DENSE_RANK() OVER (PARTITION BY ar.GorB ORDER BY ar.BegOfRange)
FROM
cte_AggRange ar
)
-- The final aggregation step...
SELECT
BegOfRange = MIN(cg.BegOfRange),
EndOfRange = MAX(cg.EndOfRange),
cg.GorB
FROM
cte_CompactGroup cg
GROUP BY
cg.DenseGroup,
cg.GorB
ORDER BY
BegOfRange;

calculate difference between two dates and generate new row for each entry

I am currently working on SSIS. I have a table with two columns Start and End dates. I need to calculate the days in between (including the start date and end date) and generate a row for each day with the other data repeating. The resulting dates should be stored in a new column.
The trick to making this work is to have a table that contains a list of all the days that are in the possible range.
In the following query, I fake it out by using the smallest date in our set MIN(Start).
I then generate a sequence of numbers from 1 to N based on the number of columns in our sys.all_columns views. That might be sufficient, it might not but based on the paucity of data, it works for now. If you need more dates generated, CROSS APPLY the sys.all_coumns table against itself.
I then use the numbers generated to build a list of dates via dateadd
I then take my ALLDATES derived table and perform an INNER JOIN to the original table, pinning the date generated for ALLDATES between Start and End columns (end points inclusive).
CREATE TABLE dbo.so_36392684
(
WeekNo int NOT NULL
, Start datetime NOT NULL
, [End] datetime NOT NULL
, SpecialEvents varchar(20) NULL
);
INSERT INTO
dbo.so_36392684
(WeekNo, Start, [End], SpecialEvents)
VALUES
(
1
, '1989-09-14'
, '1989-09-20'
, NULL
);
SELECT
S.WeekNo
, S.Start
, S.[End]
, S.SpecialEvents
, ALLDATES.ConsecutiveDays
FROM
(
SELECT
DATEADD(DAY, D.rn, S.Start) AS ConsecutiveDays
FROM
(
-- Find the first date in our table
SELECT
MIN(S.Start) AS Start
FROM
dbo.so_36392684 AS S
) AS S
CROSS APPLY
(
-- Generate a (hopefully) sufficiently large enough set of dates
SELECT
ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) -1 AS rn
FROM
sys.all_columns AS AC
) D
) AS ALLDATES
INNER JOIN
dbo.so_36392684 AS S
ON ALLDATES.ConsecutiveDays >= S.Start
AND ALLDATES.ConsecutiveDays <= S.[End];
Result should look something like this
WeekNo Start End SpecialEvents ConsecutiveDays
1 1989-09-14 1989-09-20 NULL 1989-09-14
1 1989-09-14 1989-09-20 NULL 1989-09-15
1 1989-09-14 1989-09-20 NULL 1989-09-16
1 1989-09-14 1989-09-20 NULL 1989-09-17
1 1989-09-14 1989-09-20 NULL 1989-09-18
1 1989-09-14 1989-09-20 NULL 1989-09-19
1 1989-09-14 1989-09-20 NULL 1989-09-20

SQL Server IF 0, add value for following date

I am writing this query in MSSQL Server Management Studio, and I have run into a mental road block. Let's say I have a few buildings and as they have come online I want to compare their production vs what they were forecasted, P_AMOUNT and F_AMOUNT. These are two separate databases combined into one with a query (The production db and forecast db that is).
So my problem here is. I want to select the peak production here along side the forecast each B_ID. Most of the buildings are like B_ID #2 and have a forecast amount, but occasionally there is one that does not. Like B_ID #1. How would I go about rolling the date one month forward for only the F_Date if the F_Amount = 0?
SELECT P1.B_ID, P1.P_DATE, P1.P_AMOUNT, F1.F_DATE, F1.F_AMOUNT
FROM DB1.Production P1
INNER JOIN DB2.Forecast F1
ON P1.DB_ID = F1.DB_ID
Use a nested select of the Forecast table so that you get the minimum forecast date after the production date that has a value, something like:
SELECT MIN(F1.F_DATE)
WHERE F1.F_DATE > P1.P_DATE
AND F1.F_AMOUNT > 0
There are a handful of ways to solve your problem. One way is to rank each row based on F_DATE and when the F_AMOUNT value is zero, to pull the next row. Since you did not specify a version of SQL Server, I used syntax that would work in SQL Server 2005 or later.
With RnkItems As
(
Select B_ID, P_DATE, P_AMOUNT, F_DATE, F_AMOUNT
, Row_Number() Over ( Order By F_DATE ) As Rnk
From SourceData
)
Select R.B_ID, R.P_DATE, R.P_AMOUNT
, R.F_DATE As [Original F_DATE]
, R.F_AMOUNT As [Original F_AMOUNT]
, Case R.F_AMOUNT
When 0 Then R1.F_DATE
Else R.F_DATE
End As F_DATE
, Case R.F_AMOUNT
When 0 Then R1.F_AMOUNT
Else R.F_AMOUNT
End As F_AMOUNT
From RnkItems As R
Left Join RnkItems As R1
On R1.Rnk = R.Rnk + 1
SQL Fiddle Version
If you are using SQL Server 2012, you can use the new Lead function:
Select B_ID, P_DATE, P_AMOUNT
, F_DATE As [Original F_DATE]
, F_AMOUNT As [Original F_AMOUNT]
, Case F_AMOUNT
When 0 Then Lead ( F_DATE, 1 ) Over ( Order By F_DATE )
Else F_DATE
End As F_DATE
, Case F_AMOUNT
When 0 Then Lead ( F_AMOUNT, 1 ) Over ( Order By F_DATE )
Else F_AMOUNT
End As F_AMOUNT
From SourceData
SQL Fiddle Version
The above two solutions rely on the stated requirement that you always go exactly one month ahead. If, however, you want to go the first month with a non-zero value (i.e., it could be multiple jumps), that's different:
Select S.B_ID, S.P_DATE, S.P_AMOUNT
, S.F_DATE As [Original F_DATE]
, S.F_AMOUNT As [Original F_AMOUNT]
, Case S.F_AMOUNT
When 0 Then (
Select Min( S2.F_DATE )
From SourceData As S2
Where S2.F_DATE >= S.F_DATE
And S2.F_AMOUNT <> 0
)
Else S.F_DATE
End As F_DATE
, Case S.F_AMOUNT
When 0 Then (
Select S1.F_AMOUNT
From SourceData As S1
Where S1.F_DATE = (
Select Min( S2.F_DATE )
From SourceData As S2
Where S2.F_DATE > S.F_DATE
And S2.F_AMOUNT <> 0
)
)
Else S.F_AMOUNT
End As F_AMOUNT
From SourceData As S
SQL Fiddle Version

SQL to check for 2 or more consecutive negative week values

I want to count the number of 2 or more consecutive week periods that have negative values within a range of weeks.
Example:
Week | Value
201301 | 10
201302 | -5 <--| both weeks have negative values and are consecutive
201303 | -6 <--|
Week | Value
201301 | 10
201302 | -5
201303 | 7
201304 | -2 <-- negative but not consecutive to the last negative value in 201302
Week | Value
201301 | 10
201302 | -5
201303 | -7
201304 | -2 <-- 1st group of negative and consecutive values
201305 | 0
201306 | -12
201307 | -8 <-- 2nd group of negative and consecutive values
Is there a better way of doing this other than using a cursor and a reset variable and checking through each row in order?
Here is some of the SQL I have setup to try and test this:
IF OBJECT_ID('TempDB..#ConsecutiveNegativeWeekTestOne') IS NOT NULL DROP TABLE #ConsecutiveNegativeWeekTestOne
IF OBJECT_ID('TempDB..#ConsecutiveNegativeWeekTestTwo') IS NOT NULL DROP TABLE #ConsecutiveNegativeWeekTestTwo
CREATE TABLE #ConsecutiveNegativeWeekTestOne
(
[Week] INT NOT NULL
,[Value] DECIMAL(18,6) NOT NULL
)
-- I have a condition where I expect to see at least 2 consecutive weeks with negative values
-- TRUE : Week 201328 & 201329 are both negative.
INSERT INTO #ConsecutiveNegativeWeekTestOne
VALUES
(201327, 5)
,(201328,-11)
,(201329,-18)
,(201330, 25)
,(201331, 30)
,(201332, -36)
,(201333, 43)
,(201334, 50)
,(201335, 59)
,(201336, 0)
,(201337, 0)
SELECT * FROM #ConsecutiveNegativeWeekTestOne
WHERE Value < 0
ORDER BY [Week] ASC
CREATE TABLE #ConsecutiveNegativeWeekTestTwo
(
[Week] INT NOT NULL
,[Value] DECIMAL(18,6) NOT NULL
)
-- FALSE: The negative weeks are not consecutive
INSERT INTO #ConsecutiveNegativeWeekTestTwo
VALUES
(201327, 5)
,(201328,-11)
,(201329,20)
,(201330, -25)
,(201331, 30)
,(201332, -36)
,(201333, 43)
,(201334, 50)
,(201335, -15)
,(201336, 0)
,(201337, 0)
SELECT * FROM #ConsecutiveNegativeWeekTestTwo
WHERE Value < 0
ORDER BY [Week] ASC
My SQL fiddle is also here:
http://sqlfiddle.com/#!3/ef54f/2
First, would you please share the formula for calculating week number, or provide a real date for each week, or some method to determine if there are 52 or 53 weeks in any particular year? Once you do that, I can make my queries properly skip missing data AND cross year boundaries.
Now to queries: this can be done without a JOIN, which depending on the exact indexes present, may improve performance a huge amount over any solution that does use JOINs. Then again, it may not. This is also harder to understand so may not be worth it if other solutions perform well enough (especially when the right indexes are present).
Simulate a PREORDER BY windowing function (respects gaps, ignores year boundaries):
WITH Calcs AS (
SELECT
Grp =
[Week] -- comment out to ignore gaps and gain year boundaries
-- Row_Number() OVER (ORDER BY [Week]) -- swap with previous line
- Row_Number() OVER
(PARTITION BY (SELECT 1 WHERE Value < 0) ORDER BY [Week]),
*
FROM dbo.ConsecutiveNegativeWeekTestOne
)
SELECT
[Week] = Min([Week])
-- NumWeeks = Count(*) -- if you want the count
FROM Calcs C
WHERE Value < 0
GROUP BY C.Grp
HAVING Count(*) >= 2
;
See a Live Demo at SQL Fiddle (1st query)
And another way, simulating LAG and LEAD with a CROSS JOIN and aggregates (respects gaps, ignores year boundaries):
WITH Groups AS (
SELECT
Grp = T.[Week] + X.Num,
*
FROM
dbo.ConsecutiveNegativeWeekTestOne T
CROSS JOIN (VALUES (-1), (0), (1)) X (Num)
)
SELECT
[Week] = Min(C.[Week])
-- Value = Min(C.Value)
FROM
Groups G
OUTER APPLY (SELECT G.* WHERE G.Num = 0) C
WHERE G.Value < 0
GROUP BY G.Grp
HAVING
Min(G.[Week]) = Min(C.[Week])
AND Max(G.[Week]) > Min(C.[Week])
;
See a Live Demo at SQL Fiddle (2nd query)
And, my original second query, but simplified (ignores gaps, handles year boundaries):
WITH Groups AS (
SELECT
Grp = (Row_Number() OVER (ORDER BY T.[Week]) + X.Num) / 3,
*
FROM
dbo.ConsecutiveNegativeWeekTestOne T
CROSS JOIN (VALUES (0), (2), (4)) X (Num)
)
SELECT
[Week] = Min(C.[Week])
-- Value = Min(C.Value)
FROM
Groups G
OUTER APPLY (SELECT G.* WHERE G.Num = 2) C
WHERE G.Value < 0
GROUP BY G.Grp
HAVING
Min(G.[Week]) = Min(C.[Week])
AND Max(G.[Week]) > Min(C.[Week])
;
Note: The execution plan for these may be rated as more expensive than other queries, but there will be only 1 table access instead of 2 or 3, and while the CPU may be higher it is still respectably low.
Note: I originally was not paying attention to only producing one row per group of negative values, and so I produced this query as only requiring 2 table accesses (respects gaps, ignores year boundaries):
SELECT
T1.[Week]
FROM
dbo.ConsecutiveNegativeWeekTestOne T1
WHERE
Value < 0
AND EXISTS (
SELECT *
FROM dbo.ConsecutiveNegativeWeekTestOne T2
WHERE
T2.Value < 0
AND T2.[Week] IN (T1.[Week] - 1, T1.[Week] + 1)
)
;
See a Live Demo at SQL Fiddle (3rd query)
However, I have now modified it to perform as required, showing only each starting date (respects gaps, ignored year boundaries):
SELECT
T1.[Week]
FROM
dbo.ConsecutiveNegativeWeekTestOne T1
WHERE
Value < 0
AND EXISTS (
SELECT *
FROM
dbo.ConsecutiveNegativeWeekTestOne T2
WHERE
T2.Value < 0
AND T1.[Week] - 1 <= T2.[Week]
AND T1.[Week] + 1 >= T2.[Week]
AND T1.[Week] <> T2.[Week]
HAVING
Min(T2.[Week]) > T1.[Week]
)
;
See a Live Demo at SQL Fiddle (3rd query)
Last, just for fun, here is a SQL Server 2012 and up version using LEAD and LAG:
WITH Weeks AS (
SELECT
PrevValue = Lag(Value, 1, 0) OVER (ORDER BY [Week]),
SubsValue = Lead(Value, 1, 0) OVER (ORDER BY [Week]),
PrevWeek = Lag(Week, 1, 0) OVER (ORDER BY [Week]),
SubsWeek = Lead(Week, 1, 0) OVER (ORDER BY [Week]),
*
FROM
dbo.ConsecutiveNegativeWeekTestOne
)
SELECT #Week = [Week]
FROM Weeks W
WHERE
(
[Week] - 1 > PrevWeek
OR PrevValue >= 0
)
AND Value < 0
AND SubsValue < 0
AND [Week] + 1 = SubsWeek
;
See a Live Demo at SQL Fiddle (4th query)
I am not sure I am doing this the best way as I haven't used these much, but it works nonetheless.
You should do some performance testing of the various queries presented to you, and pick the best one, considering that code should be, in order:
Correct
Clear
Concise
Fast
Seeing that some of my solutions are anything but clear, other solutions that are fast enough and concise enough will probably win out in the competition of which one to use in your own production code. But... maybe not! And maybe someone will appreciate seeing these techniques, even if they can't be used as-is this time.
So let's do some testing and see what the truth is about all this! Here is some test setup script. It will generate the same data on your own server as it did on mine:
IF Object_ID('dbo.ConsecutiveNegativeWeekTestOne', 'U') IS NOT NULL DROP TABLE dbo.ConsecutiveNegativeWeekTestOne;
GO
CREATE TABLE dbo.ConsecutiveNegativeWeekTestOne (
[Week] int NOT NULL CONSTRAINT PK_ConsecutiveNegativeWeekTestOne PRIMARY KEY CLUSTERED,
[Value] decimal(18,6) NOT NULL
);
SET NOCOUNT ON;
DECLARE
#f float = Rand(5.1415926535897932384626433832795028842),
#Dt datetime = '17530101',
#Week int;
WHILE #Dt <= '20140106' BEGIN
INSERT dbo.ConsecutiveNegativeWeekTestOne
SELECT
Format(#Dt, 'yyyy') + Right('0' + Convert(varchar(11), DateDiff(day, DateAdd(year, DateDiff(year, 0, #Dt), 0), #Dt) / 7 + 1), 2),
Rand() * 151 - 76
;
SET #Dt = DateAdd(day, 7, #Dt);
END;
This generates 13,620 weeks, from 175301 through 201401. I modified all the queries to select the Week values instead of the count, in the format SELECT #Week = Expression ... so that tests are not affected by returning rows to the client.
I tested only the gap-respecting, non-year-boundary-handling versions.
Results
Query Duration CPU Reads
------------------ -------- ----- ------
ErikE-Preorder 27 31 40
ErikE-CROSS 29 31 40
ErikE-Join-IN -------Awful---------
ErikE-Join-Revised 46 47 15069
ErikE-Lead-Lag 104 109 40
jods 12 16 120
Transact Charlie 12 16 120
Conclusions
The reduced reads of the non-JOIN versions are not significant enough to warrant their increased complexity.
The table is so small that the performance almost doesn't matter. 261 years of weeks is insignificant, so a normal business operation won't see any performance problem even with a poor query.
I tested with an index on Week (which is more than reasonable), doing two separate JOINs with a seek was far, far superior to any device to try to get the relevant related data in one swoop. Charlie and jods were spot on in their comments.
This data is not large enough to expose real differences between the queries in CPU and duration. The values above are representative, though at times the 31 ms were 16 ms and the 16 ms were 0 ms. Since the resolution is ~15 ms, this doesn't tell us much.
My tricky query techniques do perform better. They might be worth it in performance critical situations. But this is not one of those.
Lead and Lag may not always win. The presence of an index on the lookup value is probably what determines this. The ability to still pull prior/next values based on a certain order even when the order by value is not sequential may be one good use case for these functions.
you could use a combination of EXISTS.
Assuming you only want to know groups (series of consecutive weeks all negative)
--Find the potential start weeks
;WITH starts as (
SELECT [Week]
FROM #ConsecutiveNegativeWeekTestOne AS s
WHERE s.[Value] < 0
AND NOT EXISTS (
SELECT 1
FROM #ConsecutiveNegativeWeekTestOne AS p
WHERE p.[Week] = s.[Week] - 1
AND p.[Value] < 0
)
)
SELECT COUNT(*)
FROM
Starts AS s
WHERE EXISTS (
SELECT 1
FROM #ConsecutiveNegativeWeekTestOne AS n
WHERE n.[Week] = s.[Week] + 1
AND n.[Value] < 0
)
If you have an index on Week this query should even be moderately efficient.
You can replace LEAD and LAG with a self-join.
The counting idea is basically to count start of negative sequences rather than trying to consider each row.
SELECT COUNT(*)
FROM ConsecutiveNegativeWeekTestOne W
LEFT OUTER JOIN ConsecutiveNegativeWeekTestOne Prev
ON W.week = Prev.week + 1
INNER JOIN ConsecutiveNegativeWeekTestOne Next
ON W.week = Next.week - 1
WHERE W.value < 0
AND (Prev.value IS NULL OR Prev.value > 0)
AND Next.value < 0
Note that I simply did "week + 1", which would not work when there is a year change.