SQL Server 2012 - Running Total With Backlog & Carry Forward - sql

Good afternoon,
Hope that you're all well and wish you a happy new year.
I'm experiencing some curious behaviour with a query that I've written in that the LAG function is inconsistent.
Essentially, I have a dataset (made up of 2 CTEs) which each contain the month (in MMM-YYYY format) and then one holds a count of tickets opened, and the other contains the same but for tickets closed.
What I am then doing is adding in a 'Backlog' column (which will be 0 for the first month in all cases) and a 'Carried Forward' column. The Carried Forward amount will be the balance of that month ( Created + Backlog ) and will be reflected as the Backlog for the following month.
I had this ticking over quite nicely until I realised that negative backlogs were fudging the numbers a bit. What I mean is, for example:
10 Tickets Created
12 Tickets Resolved
0 Ticket Backlog
-2 Tickets Carried Forward
In this circumstance, I've had to zero any negative backlog for our reporting purposes.
This is seemingly where the problems come into play. For the first few months, everything will be fine - the values will be right, carrying forward the correct numbers and factoring them into the calculations accordingly. But then it will carry over a number of (seemingly) indeterminable origin which of course, has a knock-on effect on the accuracy past this point.
With the Window Functions introduced with SQL Server 2012, this should be quite basic - but evidently not!
Whilst I'm quite happy to post code (I have tried a fair few ways of skinning this cat), I feel as though if someone is able to give a high-level overview of how it should be written, I'll see where I went wrong immediately. In doing so, I'll then respond accordingly with my attempt/s for completeness.
Thank you very much in advance!
Picture of result error:
, OpenClosed AS
(
SELECT
c.[Created Month] 'Month'
, c.Tickets 'Created'
, r.Tickets 'Resolved'
, IIF( ( c.Tickets - r.Tickets ) < 0, 0, ( c.Tickets - r.Tickets ) ) 'Balance'
FROM
Created c
JOIN Resolved r ON
c.[Created Month] = r.[Resolved Month]
)
, CarryForward AS
(
SELECT
ROW_NUMBER() OVER( ORDER BY CAST( '1.' + Month AS DATETIME ) ) 'Row No'
, Month 'Month'
, Created 'Created'
, Resolved 'Resolved'
, LAG( Balance, 1, 0 ) OVER( ORDER BY CAST( '1.' + Month AS DATETIME ) ) 'Backlog'
, IIF( ( ( Created + LAG( Balance, 1, 0 ) OVER( ORDER BY CAST( '1.' + Month AS DATETIME ) ) ) - Resolved ) < 0
, 0
, ( ( Created + LAG( Balance, 1, 0 ) OVER( ORDER BY CAST( '1.' + Month AS DATETIME ) ) ) - Resolved )
) 'Carry Forward'
FROM
OpenClosed
)
SELECT
c1.Month 'Month'
, c1.Created 'Created'
, c1.Resolved 'Resolved'
, c2.[Carry Forward] 'Backlog'
, IIF( ( c1.Created + c2.[Carry Forward] ) - c1.Resolved < 0
, 0
, ( c1.Created + c2.[Carry Forward] ) - c1.Resolved
) 'Carried Forward'
FROM
CarryForward c1
JOIN CarryForward c2 ON
c2.[Row No] = c1.[Row No]-1

From comments on question. Incidentally, the Created Month column should be redone somehow so that the year is placed before the month - like 2015-01. This will ensure correct ordering by default sort algorithms.
If the date must be presented as Jan-2015 in the final report, do that presentational work as the very final step in the query.
WITH ticket_account AS
(
SELECT
c.[Created Month] AS Month
,c.Tickets AS Created
,r.Tickets AS Resolved
FROM
Created AS c
INNER JOIN
Resolved AS r
ON c.[Created Month] = r.[Resolved Month]
)
SELECT
*
,(SUM(Created) OVER (ORDER BY Month ASC) - SUM(Resolved) OVER (ORDER BY Month ASC)) AS Balance
FROM
ticket_account

Related

SQL Server group by overlapping 10 day intervals

I have a table which logs each individual piece produced across several production machines, going back a number of years. Periodically (e.g. once per week) I want to check this table to establish "best performance" records for each machine and product combination, storing them in a new table according to the following rules;
The machine must have produced a minimum of 10,000 parts over a 10 day period - if only 9000 parts were produced over 10 days, this is an invalid record
The machine must have been running the same product without changing over for the entire period i.e. if on day 5 the product changed, this is an invalid record
The Performance data table looks like below [VisionMachineResults]
ID
MCSAP
DateTime
ProductName
InspectionResult
1
123456
2020-01-01 08:29:34:456
Product A
0
2
123456
2020-01-01 08:45:50:456
Product B
1
3
844214
2020-01-01 08:34:48:456
Product A
2
4
978415
2020-01-02 09:29:26:456
Product C
0
5
985633
2020-01-04 23:29:11:456
Product A
2
I am able to produce a result which gives a list of individual days performance per SAP / Product Combination, but I then need to process the data in a complex loop outside of SQL to establish the 10 day groups.
My current query is:
SELECT CAST(DateTime AS date) AS InputDate,
MCSAP,
ZAssetRegister.LocalName,
ProductName,
SUM(CASE WHEN InspectionResult = 0 THEN 1 END) AS OKParts,
COUNT(CASE WHEN InspectionResult > 0 THEN 1 END) AS NGParts
FROM [VisionMachineResults]
INNER JOIN ZAssetRegister ON VisionMachineResults.MCSAP = ZAssetRegister.SAP_Number
GROUP BY CAST(DateTime AS date),
MCSAP,
ProductName,
ZAssetRegister.LocalName
ORDER BY InputDate,
ZAssetRegister.LocalName;
Would it be possible to have the SQL query give the result in 10 day groups, instead of per individual day i.e.
01-01-2021 to 11-01-2021 | Machine 1 | Product 1 | 20,000 | 5,000
02-01-2021 to 12-01-2021 | Machine 1 | Product 1 | 22,000 | 1,000
03-01-2021 to 13-01-2021 | Machine 1 | Product 1 | 18,000 | 4,000
etc...
I would then iterate through the rows to find the one with the best percentage of OK parts. Any ideas appreciated!
This process needs to be considered in many levels. First, you mention 10 consecutive days. We dont know if those days include weekends, if the machines are running 24/7. If the dates running can skip over holidays as well? So, 10 days could be Jan 1 to Jan 10. But if you skip weekends, you only have 6 actual WEEKDAYS.
Next, consideration of a machine working on more than one product, such as a switching between dates, or even within a single day.
As a commenter indicated, having column names by same as a reserved word (such as DateTime), bad practice and try to see if any new columns are common key words that may cause confusion and avoid them.
You also mention that you had to do complex looping checks, and how to handle joining out to 10 days, the splits, etc. I think I have a somewhat elegant approach to doing this and should prove to be rather simple in the scheme of things.'
You are using SQL-Server, so I will do this using TEMP tables via "#" table names. This way, when you are done with a connection, or a call to making this a stored procedure, you dont have to keep deleting and recreating them. That said, let me take you one-step-at-a-time.
First, I'm creating a simple table matching your structure, even with the DateTime context.
CREATE TABLE VisionMachineResults
(
ID int IDENTITY(1,1) NOT NULL,
MCSAP nvarchar(6) NOT NULL,
DateTime datetime NOT NULL,
ProductName nvarchar(10) NOT NULL,
InspectionResult int NOT NULL,
CONSTRAINT ID PRIMARY KEY CLUSTERED
(
[ID] ASC
) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
Now, I'm inserting the data, similar to what you have, but not millions of rows. You mention you are looking for 10 days out, so I just padded the end with several extras to simulate that. I also explicitly forced a gap change of product by the one machine on Jan 5th. Additionally, I added a product change on Jan 7th to trigger this a "break" within your 10-day consideration. You'll see the results later.
insert into VisionMachineResults
(MCSAP, [DateTime], ProductName, InspectionResult )
values
( '123456', '2020-01-01 08:29:34.456', 'Product A', 0 ),
( '123456', '2020-01-01 08:29:34.456', 'Product B', 1 ),
( '844214', '2020-01-01 08:29:34.456', 'Product A', 2 ),
( '978415', '2020-01-02 08:29:34.456', 'Product C', 0 ),
( '985633', '2020-01-04 08:29:34.456', 'Product A', 0 ),
( '985633', '2020-01-05 08:29:34.456', 'Product A', 0 ),
( '985633', '2020-01-05 08:29:34.456', 'Product B', 0 ),
( '985633', '2020-01-06 08:29:34.456', 'Product A', 2 ),
( '985633', '2020-01-07 08:29:34.456', 'Product B', 0 ),
( '985633', '2020-01-08 08:29:34.456', 'Product A', 0 ),
( '985633', '2020-01-09 08:29:34.456', 'Product A', 0 ),
( '985633', '2020-01-10 08:29:34.456', 'Product A', 0 ),
( '985633', '2020-01-11 08:29:34.456', 'Product A', 0 ),
( '985633', '2020-01-12 08:29:34.456', 'Product A', 0 ),
( '985633', '2020-01-13 08:29:34.456', 'Product A', 0 ),
( '985633', '2020-01-14 08:29:34.456', 'Product A', 1 ),
( '985633', '2020-01-15 08:29:34.456', 'Product A', 0 ),
( '985633', '2020-01-16 08:29:34.456', 'Product A', 0 ),
( '985633', '2020-01-17 08:29:34.456', 'Product A', 0 ),
( '985633', '2020-01-18 08:29:34.456', 'Product A', 0 ),
( '985633', '2020-01-19 08:29:34.456', 'Product A', 0 ),
( '985633', '2020-01-20 08:29:34.456', 'Product A', 0 )
go
So now, consider this the baseline of YOUR production data. My first query will be doing a bunch of things, but storing the pre-aggregations INTO #tmpPartDailyCounts result table. This way you can look at them at the different stages to apply sanity check to my approach.
Here, on a per machine (MCSAP), and Date (without time portion), I am grabbing certain aggregates, and keeping them grouped by machine and date.
select
VMR.MCSAP,
cast(VMR.DateTime as Date) as InputDate,
min( VMR.ProductName ) ProductName,
max( VMR.ProductName ) LastProductName,
count( distinct VMR.ProductName ) as MultipleProductsSameDay,
sum( case when VMR.InspectionResult = 0 then 1 else 0 end ) OKParts,
sum( case when NOT VMR.InspectionResult = 0 then 1 else 0 end ) BadParts,
count(*) TotalParts
into
#tmpPartDailyCounts
from
VisionMachineResults VMR
group by
VMR.MCSAP,
cast(VMR.DateTime as Date)
You were joining to an asset table and dont think you really need that. If the machine made the product, does it matter if a final assembly is complete? Dont know, you would know better.
Now, the aggregates and why. The min( VMR.ProductName ) ProductName and max( VMR.ProductName ) LastProductName, this is just to carry-forward the product name created on the date in question for any final output. If on a given day, only one product was made, it would be the same anyhow, just pick one. However, if on any day there are multiple products, the MIN() and MAX() will be of different values. If the same product across all that are built, then both values would be the same -- ON ANY SINGLE GIVEN DATE.
The rest are simple aggregates of OK parts, BAD parts (something was wrong), but also the TOTAL Parts created, regardless of any inspection failure. This is the primary qualifier for you to hit you 10,000, but if you wanted to change to 10,000 GOOD parts, change accordingly.
select
VMR.MCSAP,
cast(VMR.DateTime as Date) as InputDate,
min( VMR.ProductName ) ProductName,
max( VMR.ProductName ) LastProductName,
count( distinct VMR.ProductName ) as MultipleProductsSameDay,
sum( case when VMR.InspectionResult = 0 then 1 else 0 end ) OKParts,
sum( case when NOT VMR.InspectionResult = 0 then 1 else 0 end ) BadParts,
count(*) TotalParts
into
#tmpPartDailyCounts
from
VisionMachineResults VMR
group by
VMR.MCSAP,
cast(VMR.DateTime as Date)
Now, at this point, I have a pre-aggregation done on a per machine and date basis. Now, I want to get some counter that is sequentially applied on a per date that a product was done. I will pull this result into a temp table #tmpPartDays. By using the over/partition, this will create a result that first puts the records in order of MCSAP, then by the date and dumps an output with whatever the ROW_NUMBER() is to that. So, if there is no activity for a given machine such as over a weekend or holiday that the machine is not running, the SEQUENTIAL counter via OVER/PARTITION will keep them sequentially 1 through however many days... Again, query the result of this table and you'll see it.
By querying against the pre-aggregated table, that may account for 500k records and results down to say 450 via per machine/day, This query is now only querying against the 450 and will be very quick.
SELECT
PDC.MCSAP,
PDC.InputDate,
MultipleProductsSameDay,
ROW_NUMBER() OVER(PARTITION BY MCSAP
ORDER BY [InputDate] )
AS CapDay
into
#tmpPartDays
FROM
#tmpPartDailyCounts PDC
ORDER BY
PDC.MCSAP;
Now, is the kicker, tying this all together. I'm starting with just the #tmpPartDays JOINED to itself on the same MCSAP AND a MUST-HAVE matching record 10 days out... So this resolves issues of weekend / holidays since serial consecutive.
This now give me the begin/end date range such as 1-10, 2-11, 3-12, 4-13, etc.
I then join to the tmpPartDailyCounts result on the same part AND the date is at the respective begin (PD.InputDate) and END (PD2.InputDate). I re-apply the same aggregates to get the total counts WITHIN EACH Part + 10 day period. Run this query WITHOUT the "HAVING" clause to see what is coming out.
select
PD.MCSAP,
PD.InputDate BeginDate,
PD2.InputDate EndDate,
SUM( PDC.MultipleProductsSameDay ) as TotalProductsMade,
sum( PDC.OKParts ) OKParts,
sum( PDC.BadParts ) BadParts,
sum( PDC.TotalParts ) TotalParts,
min( PDC.ProductName ) ProductName,
max( PDC.LastProductName ) LastProductName
from
#tmpPartDays PD
-- join again to get 10 days out for the END cycle
JOIN #tmpPartDays PD2
on PD.MCSAP = PD2.MCSAP
AND PD.CapDay +9 = PD2.CapDay
-- Now join to daily counts for same machine and within the 10 day period
JOIN #tmpPartDailyCounts PDC
on PD.MCSAP = PDC.MCSAP
AND PDC.InputDate >= PD.InputDate
AND PDC.InputDate <= PD2.InputDate
group by
PD.MCSAP,
PD.InputDate,
PD2.InputDate
having
SUM( PDC.MultipleProductsSameDay ) = 10
AND min( PDC.ProductName ) = max( PDC.LastProductName )
AND SUM( PDC.TotalParts ) >= 10
Finally, the elimination of the records you DONT want. Since I dont have millions of records to simulate, just follow along. I am doing a HAVING on
SUM( PDC.TotalParts ) >= 10
SUM( PDC.MultipleProductsSameDay ) = 10
If on ANY day there are MORE than 1 product created, the count would be 11 or more, thus indicating not the same product, so that would cause an exclusion. But also, if at the tail-end of data such as only 7 days of production, it would never HIT 10 which was your 10-day qualifier also.
2. AND min( PDC.ProductName ) = max( PDC.LastProductName )
Here, since we are spanning back to the DAILY context, if ANY product changes on any date, the Product Name (via min) and LastProductName (via max) will change, regardless of the day, and regardless of the name context. So, by making sure both the min() and max() are the same, you know it is the same product across the entire span.
3. AND SUM( PDC.TotalParts ) >= 10
Finally, the count of things made. In this case, I did >= 10 because I was only testing with 1 item per day, thus 10 days = 10 items. In your scenario, you may have 987 in one day, but 1100 in another, thus balancing low and high production days to get to that 10,000, but for sample of data, just change YOUR context to the 10,000 limit minimum.
This SQLFiddle shows the results as it gets down to the per machine/day and showing the sequential activity. The last MCSAP machine starts on Jan 4th, but has a sequential day row assignment starting at 1 to give proper context to the 1-10, 2-11, etc.
First SQL Fiddle showing machine/day
Second fiddle shows final query WITHOUT the HAVING clause and you can see the first couple rows of TotalProductsMade is 11 which means SOMETHING on any of the day-span in question created different products and would be excluded from final. For the begin and end dates of Jan 6-15 and Jan 7-16, you will see the MIN/MAX products showing Product A and Product B, thus indicating that SOMEWHERE within its 10-day span a product switched... These too will be excluded.
The FINAL query This query shows the results with the HAVING clause applied.
One option that comes to my mind is the use of a numbers table (google Jeff Moden on SQL Server Central for more background).
The number table then uses a start date (from the range of dates to investigate) that in addition to generate a date to link to also generates a "bucket" by which to group afterwards.
Similar to:
-- generate date frame from and to
DECLARE
#date_start date = Convert( date, '20211110', 112 ),
#date_end date = Convert( date, '20220110', 112 )
;
WITH
cteN
(
Number
)
AS
( -- build a list of 10 single digit numbers
SELECT Cast( 0 AS int ) AS Number UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9
)
,
cteNumbers
(
Number
)
AS
( -- splice single digit numbers to list from 0 to 99999
SELECT
cN10000.Number * 10000 + cN1000.Number * 1000 + cN100.Number * 100 + cN10.Number * 10 + cN1.Number
FROM
cteN AS cN10000
CROSS JOIN cteN AS cN1000
CROSS JOIN cteN AS cN100
CROSS JOIN cteN AS cN10
CROSS JOIN cteN AS cN1
)
,
cteBucketOffset
(
DatediffNum,
Offset
)
AS
( -- determine the offset in datediffs to number buckets later correctly
SELECT
Cast( Datediff( dd, #date_start, #date_end ) AS int ) - 1 AS DatediffNum,
Cast( Datediff( dd, #date_start, #date_end ) % 10 AS tinyint ) - 1 AS Offset
)
,
cteDates
(
Dated,
Bucket,
BucketNumber,
BucketOffset,
DatediffNum
)
AS
( -- generate list of dates with bucket batches and numbers
SELECT
Dateadd( dd, cN.Number * -1, #date_end ) AS Dated,
Cast( ( cBO.Offset + cN.Number ) / 10 AS int ) AS Bucket,
Cast( ( cBO.Offset + cN.Number ) % 10 AS tinyint ) AS BucketNumber,
cBO.Offset,
cBO.DatediffNum
FROM
cteNumbers AS cN
CROSS JOIN cteBucketOffset AS cBO
WHERE
cN.Number <= Datediff( dd, #date_start, #date_end )
)
SELECT
*
FROM
cteDates AS cD
ORDER BY
cD.Dated ASC
;
Long winded due to showing each step. The result is a table-on-the-fly usable to join back to the raw data. "Bucket" can then be used instead of the date itself to group raw data.
Once this data is built then decisions can be made on the grouped conditions like having a minimum number of rows.
Seems just a matter of grouping on the year and the day of the year divided by 10.
SELECT
CONCAT(CONVERT(VARCHAR(10),MIN([DateTime]),105), ' to ', CONVERT(VARCHAR(10), MAX([DateTime]), 105)) AS InputDateRange
, MCSAP
, MAX(ZAssetRegister.LocalName) AS LocalName
, ProductName
, SUM(CASE WHEN InspectionResult = 0 THEN 1 END) AS OKParts
, COUNT(CASE WHEN InspectionResult > 0 THEN 1 END) AS NGParts
, COUNT(DISTINCT CAST([Datetime] AS DATE)) AS total_days
FROM VisionMachineResults
JOIN ZAssetRegister
ON VisionMachineResults.MCSAP = ZAssetRegister.SAP_Number
GROUP BY
DATEPART(YEAR, [DateTime]),
CEILING(DATEPART(DAYOFYEAR, [DateTime])/10.0),
MCSAP,
ProductName
ORDER BY
MIN([DateTime]),
MAX(ZAssetRegister.LocalName);
Simplified test on db<>fiddle here

How to Show Previous Year on Dynamic Date

I have a database that has customer, product, date and volume/revenue data. I'd like to create two NEW columns to show the previous year volume and revenue based on the date/customer/product.
I've tried unioning two views, one that has dates (unchanged) and a second view that creates a CTE where I select the dates minus one year with another select statement off of that where VOL and REV are renamed VOL_PY and REV_PY but the data is incomplete. Basically what's happening is the PY data is only pulling volume and revenue if there is data in the prior year (for example if a customer didn't sell a product in 2021 but DID in 2020, it wouldn't pull for the VOL_PY for 2020 - because it didn't sell in 2021). How do I get my code to include matches in dates but also the instances where there isn't data in the "current" year?
Here's what I'm going for:
[EXAMPLE DATA WITH NEW COLUMNS]
CURRENT YEAR VIEW:
SELECT
CUSTOMER
,PRODUCT
,DATE
,VOL
,REV
,0 AS VOL_HL_PY
,0 AS REV_DOLS_PY
,DATEADD(YEAR, -1, DATE) AS DATE_PY FROM dbo.vwReporting
PREVIOUS YEAR VIEW:
WITH CTE_PYFIGURES
([AUTONUMBER]
,CUSTOMER
,PRODUCT
,DATE
,VOL
,REV
,DATE_PY
) AS
(
SELECT b.*
, DATEADD(YEAR, 1, DATE) AS DATE_PY
FROM dbo.basetable b
)
SELECT
v.CUSTOMER
,v.PRODUCT
,v.DATE
,0 AS v.VOL
,0 AS v.REV
,CTE.VOL_HL AS VOL_HL_PY
,CTE.REV_DOLS AS REV_DOLS_PY
,DATEADD(YEAR,-1,CTE.PERIOD_DATE_PY) AS PERIOD_DATE_PY
FROM dbo.vwReporting AS v
FULL OUTER JOIN CTE_PYFIGURES AS CTE ON CTE.CUSTOMER=V.CUSTOMER AND CTE.PRODUCT=V.PRODCUT AND CTE.DATE_PY=V.DATE
You need to offset your current year's data to one year forward and then union it with the current data, placing zeroes for "other" measures (VOL and REV for previous year and VOL_PY and REV_PY for current year). Then do aggregation. This way you'll have all the dimensions' values that were in current or previous year.
with a as (
select
CUSTOMER
, PRODUCT
, [DATE]
, VOL
, REV
, 0 as vol_py
, 0 as rev_py
from dbo.vwReporting
union all
select
CUSTOMER
, PRODUCT
, dateadd(year, 1, [DATE]) as [DATE]
, 0 as VOL
, 0 as REV
, vol as vol_py
, rev as rev_py
from dbo.vwReporting
)
select
CUSTOMER
, PRODUCT
, [DATE]
, VOL
, sum(vol) as vol
, sum(rev) as rev
, sum(vol_py) as vol_py
, sum(rev_py) as rev_py
from a
group by
CUSTOMER
, PRODUCT
, [DATE]
, VOL

Need to add 3 months to each value within a column, based on the 1st '3 Months' calculated off the Admission Date column in T-SQL

I have 14K records table as the following (example of the data related to one particular client_id = 1002):
(my date format is mm/dd/yyyy, months come first)
ClientsEpisodes:
client_id adm_date disch_date
1002 3/11/2005 5/2/2005
1002 8/30/2005 2/16/2007
1002 3/16/2017 NULL
In SQL Server (T-SQL) - I need to calculate + 3 months date into the new column [3Month Date], where the 1st "+ 3 months" value will be calculated off my existing [adm_date] column. Then + 3 more months should be added to the value in [3Months Date], then the next 3 months should be added to the next value in the [3Months Date] column, and so on..., until [3MonthsDate] <= [disch_date]. When [3Months Date] is more than [disch_date] then the data shouldn't be populated. If my [disch_date] IS NULL then the condition should be
[3Months Date] <= current date (whatever it is) from GETDATE() function.
Here is what I expect to see as a result:
(I highlighted my dates offsets with different colors, for a better view)
Below, I'll clarify with more detailed explanation, about each populated (or not populated) data set:
My first [adm_date] from ClientsEpisode table was 3/11/2005.
Adding 3 months:
3/11/2005 + 3 months = 6/11/2005 - falls AFTER the initial [disch_date] (5/2/2005) - not populated
Next [adm_date] from ClientEpisode is 8/3/2005 + 3 Months = 11/30/2005;
then + 3 months must be added to 11/30/2005 = 2/30/2006;
then 2/30/2006 + 3 months = 5/30/2006;
then 5/30/2006 + 3 months = 8/30/2006;
then 8/30/2006 + 3 months = 11/30/2006;
then 11/30/2006 + 3 months = 3/2/2007 - falls AFTER my [disch_date]
(2/16/2007) - not populated
the same algorithm for the next [adm_date] - [disch_date] sets 11/5/2007-2/7/2009 (in dark blue).
then, where [adm_date] = 3/16/17, I have [disch_date] IS NULL, so, the algorithm applies until
[3 Months Date] <= current date (10/15/2020 in this case)
You can use recursive common expression. Below is an example. Note, that you can change the DATEADD part with other (for example add 90 days if you want) - it's a matter of bussness logic.
DECLARE #DataSource TABLE
(
[client_id] INT
,[adm_date] DATE
,[disch_date] DATE
);
INSERT INTO #DataSource ([client_id], [adm_date], [disch_date])
VALUES (1002, '3/11/2005 ', '5/2/2005')
,(1002, '8/30/2005 ', '2/16/2007')
,(1002, '3/16/2017 ', NULL);
WITH DataSource AS
(
SELECT ROW_NUMBER() OVER(ORDER BY [client_id]) AS [row_id]
,[client_id]
,[adm_date]
,DATEADD(MONTH, 3, [adm_date]) AS [3Month Date]
,ISNULL([disch_date], GETUTCDATE()) AS [disch_date]
FROM #DataSource
WHERE DATEADD(MONTH, 3, [adm_date]) <= ISNULL([disch_date], GETUTCDATE())
),
RecursiveDataSource AS
(
SELECT [row_id]
,[client_id]
,[adm_date]
,[3Month Date]
,[disch_date]
,0 AS [level]
FROM DataSource
UNION ALL
SELECT DS.[row_id]
,DS.[client_id]
,DS.[adm_date]
,DATEADD(MONTH, 3, RDS.[3Month Date])
,DS.[disch_date]
,[level] + 1
FROM RecursiveDataSource RDS
INNER JOIN DataSource DS
ON RDS.[row_id] = DS.[row_id]
AND DATEADD(MONTH, 3, RDS.[3Month Date]) < DS.[disch_date]
)
SELECT *
FROM RecursiveDataSource
ORDER BY [row_id]
,[level];
This question already has an accepted answer, but you say in the comments for that, that you have performance problems. Try this instead - it's also a lot simpler.
A recursive CTE is really useful if the value of the next row depends on the value of the previous row.
Here, we don't need the answer to the previous row - we just add n x 3 months (e.g., 3 months, 6 months, 9 months) and filter the rows you want to keep.
Therefore, instead of doing a recursive CTE, just do it via set logic.
Here's some data setup:
CREATE TABLE #Datasource (client_id int, adm_date date, disch_date date);
INSERT INTO #Datasource (client_id, adm_date, disch_date) VALUES
(1002, '20050311', '20050502'),
(1002, '20050830', '20070216'),
(1002, '20170316', NULL),
(1002, '20071105', '20090207');
And here's the simple SELECT
WITH DataSourceMod AS
(SELECT client_id, adm_date, disch_date, ISNULL(disch_date, getdate()) AS disc_date_mod
FROM #Datasource
),
Nums_One_to_OneHundred AS
(SELECT a * 10 + b AS n
FROM (VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) A(a)
CROSS JOIN (VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10)) B(b)
)
SELECT ds.client_id, ds.adm_date, ds.disch_date, DATEADD(month, 3*Nums.n, ds.adm_date) AS ThreeMonthDate
FROM DataSourceMod ds
CROSS JOIN Nums_One_to_OneHundred Nums
WHERE DATEADD(month, 3* Nums.n, ds.adm_date) <= ds.disc_date_mod
ORDER BY ds.client_id, ds.adm_date;
This works by
Calculating the effective discharge date (the specified date, or today)
Calculating all possible rows for up to 300 months in the future (the table One_to_OneHundred .. um.. has all the values from 1 to 100, then multiplied by 3.)
Only taking those that fulfil the date condition
You can further optimise this if desired, by limiting the number of 3 months you need to add. Here's a rough version.
WITH DataSourceMod AS
(SELECT client_id, adm_date, disch_date, ISNULL(disch_date, getdate()) AS disc_date_mod,
FLOOR(DATEDIFF(month, adm_date, ISNULL(disch_date, getdate())) / 3) + 1 AS nMax
FROM #Datasource
),
Nums_One_to_OneHundred AS
(SELECT a * 10 + b AS n
FROM (VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) A(a)
CROSS JOIN (VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10)) B(b)
)
SELECT ds.client_id, ds.adm_date, ds.disch_date, DATEADD(month, 3*Nums.n, ds.adm_date) AS ThreeMonthDate
FROM DataSourceMod ds
INNER JOIN Nums_One_to_OneHundred Nums ON Nums.n <= ds.nMax
WHERE DATEADD(month, 3* Nums.n, ds.adm_date) <= ds.disc_date_mod
ORDER BY ds.client_id, ds.adm_date;

Splitting out a cost dynamically across weeks

I’m creating an interim table in SQL Server for use with PowerBI to query financial data.
I have a finance transactions table tblfinance with
CREATE TABLE TBLFinance
(ID int,
Value float,
EntryDate date,
ClientName varchar (250)
)
INSERT INTO TBLFinance(ID ,Value ,EntryDate ,ClientName)
VALUES(1,'1783.26','2018-10-31 00:00:00.000','Alpha')
, (2,'675.3','2018-11-30 00:00:00.000','Alpha')
, (3,'243.6','2018-12-31 00:00:00.000','Alpha')
, (4,'8.17','2019-01-31 00:00:00.000','Alpha')
, (5,'257.23','2019-01-31 00:00:00.000','Alpha')
, (6,'28','2019-02-28 00:00:00.000','Alpha')
, (7,'1470.61','2019-03-31 00:00:00.000','Bravo')
, (8,'1062.86','2019-04-30 00:00:00.000','Bravo')
, (9,'886.65','2019-05-31 00:00:00.000','Bravo')
, (10,'153.31','2019-05-31 00:00:00.000','Bravo')
, (11,'150.24','2019-06-30 00:00:00.000','Bravo')
, (12,'690.14','2019-07-31 00:00:00.000','Charlie')
, (13,'21.67','2019-08-31 00:00:00.000','Charlie')
, (14,'339.29','2018-10-31 00:00:00.000','Charlie')
, (15,'807.96','2018-11-30 00:00:00.000','Delta')
, (16,'48.94','2018-12-31 00:00:00.000','Delta')
I’m calculating transaction values that fall within a week. My week ends on a Sunday, so I have the following query:
INSERT INTO tblAnalysis
(WeekTotal
, WeekEnd
, Client
)
SELECT SUM (VALUE) AS WeekTotal
, dateadd (day, case when datepart (WEEKDAY, EntryDate) = 1 then 0 else 8 - datepart (WEEKDAY, EntryDate) end, EntryDate) AS WeekEnd
, ClientName as Client
FROM dbo.tblFinance
GROUP BY dateadd (day, case when datepart (WEEKDAY, EntryDate) = 1 then 0 else 8 - datepart (WEEKDAY, EntryDate) end, EntryDate), CLIENTNAME
I’ve now been informed that some of the costs incurred within a given week maybe monthly, and therefore need to be split into 4 weeks, or annually, so split into 52 weeks. I will write a case statement to update the costs based on ClientName, so assume there is an additional field called ‘Payfrequency’.
I want to avoid having to pull the values affected into a temp table, and effectively write this – because there’ll be different sums applied depending on frequency.
SELECT *
INTO #MonthlyCosts
FROM
(
SELECT
client
, VALUE / 4 AS VALUE
, WEEKENDING
FROM tblAnalysis
UNION
SELECT
client
, nt_acnt
, VALUE / 4 AS VALUE
, DATEADD(WEEK,1,WEEKENDING) AS WEEKENDING
FROM tblAnalysis
UNION
SELECT
client
, VALUE / 4 AS VALUE
, DATEADD(WEEK,2,WEEKENDING) AS WEEKENDING
FROM tblAnalysis
UNION
SELECT
client
, VALUE / 4 AS VALUE
, DATEADD(WEEK,3,WEEKENDING) AS WEEKENDING
FROM tblAnalysis
) AS A
I know I need a stored procedure to hold variables so the calculations can be carried out dynamically, but have no idea where to start.
You can use recursive CTEs to split the data:
with cte as (
select ID, Value, EntryDate, ClientName, payfrequency, 1 as n
from TBLFinance f
union all
select ID, Value, EntryDate, ClientName, payfrequency, n + 1
from cte
where n < payfrequency
)
select *
from cte;
Note that by default this is limited to 100 recursion steps. You can add option (maxrecursion 0) for unlimited numbers of days.
The best solution would be to make use of a numbers table. If you can create a table on your server with one column holding a sequence of integer numbers.
You can then use it like this for your weekly values:
SELECT
client
, VALUE / 52 AS VALUE
, DATEADD(WEEK,N.Number,WEEKENDING) AS WEEKENDING
FROM tblAnalysis AS A
CROSS JOIN tblNumbers AS N
WHERE NCHAR.Number <=52

How to calculate prior year sales data in SQL

I'm attempting to build a table summarizing sales data by week. In it, I'm trying to have one of the adjacent columns show the sales figures for the same fiscal week during the prior year (which due to my organizations fiscal calendar, had a 53rd week last year). I also have need to compare (Comp Units/Comp Sales) to a period 52 weeks ago which is an entirely different fiscal week (Think Week 9 of 2019 comparing to Week 10 2018).
I've tried using both unions and full outer joins, but given the way the way my data is, they're inefficient (Because this is weekly data, unions ended up being inefficient as I needed to leave the date information out of the initial query, then updating columns in my table to reflect the week the data is for. This is obviously rife with opportunity for error, but also time consuming to do 105 times), or just didn't work (attempting a full outer join was returning the wrong answers for all columns). I've also tried utilizing CTEs as well, and that's not working for me either. I'm currently trying a CASE Statement, but that's also returning a null value. I'm not quite sure where to go next
#STANDARDSQL
SELECT
DTL.SKU_NBR AS SKU_NBR
, SLS.STR_NBR AS STR_NBR
, CONCAT(TRIM(CAST(SKU_HIER.SKU_NBR AS STRING)), ' ', '-', ' ', TRIM(SKU_HIER.SKU_DESC)) AS SKU
, CONCAT(TRIM(CAST(SKU_HIER.EXT_SUB_CLASS_NBR AS STRING)), ' ', '-', ' ', TRIM(SKU_HIER.SUB_CLASS_DESC)) AS SUB_CLASS
, CONCAT(TRIM(CAST(SKU_HIER.EXT_SUB_SC_NBR AS STRING)), ' ', '-', ' ', TRIM(SKU_HIER.SUB_SC_DESC)) AS SUB_SUB_CLASS
, LOCATION.MKT_NM AS MARKET_NAME
, LOCATION.RGN_NM AS REGION_NAME
, LOCATION.DIV_NM AS DIVISION_NAME
, LOCATION.DIV_NBR AS DIVISION_NUMBER
, LOCATION.RGN_NBR AS REGION_NUMBER
, LOCATION.MKT_NBR AS MARKET_NUMBER
, COMP.STR_COMP_IND AS COMP_IND
, COMP.PY_STR_COMP_IND AS PRIOR_COMP_IND
, CALENDAR.FSCL_WK_DESC AS FISCAL_WEEK
, CALENDAR.FSCL_PRD_DESC AS FISCAL_PERIOD
, CALENDAR.FSCL_WK_END_DT AS END_DATE
, CALENDAR.FSCL_WK_BGN_DT AS BEGIN_DATE
, CALENDAR.FSCL_YR AS FISCAL_YEAR_NBR
, CALENDAR.FSCL_WK_NBR AS WEEK_NUMBER
, CALENDAR.FSCL_YR_WK_KEY_VAL AS FISCAL_KEY
, CALENDAR.LY_FYR_WK_KEY_VAL AS LY_FISCAL_KEY
, SUM(COALESCE(DTL.UNT_SLS,0)) AS UNITS
, SUM(COALESCE(DTL.EXT_RETL_AMT,0) + COALESCE(DTL.TOT_GDISC_DTL_AMT,0))
AS SALES
, SUM(CASE
WHEN 1=1 THEN (COALESCE(DTL.EXT_RETL_AMT,0) + COALESCE(DTL.TOT_GDISC_DTL_AMT,0)) * COMP.STR_COMP_IND
ELSE 0 END) AS COMP_SALES
, SUM(CASE
WHEN 1=1 THEN (COALESCE(DTL.UNT_SLS,0)) * COMP.STR_COMP_IND
ELSE 0 END) AS COMP_UNITS
, SUM(CASE
WHEN 1=1 AND SLS.SLS_DT = DATE_SUB(SLS.SLS_DT, INTERVAL 364 DAY)
THEN (COALESCE(DTL.EXT_RETL_AMT,0) +
COALESCE(DTL.TOT_GDISC_DTL_AMT,0)) * COMP.PY_STR_COMP_IND
ELSE NULL END)
AS LY_COMP_SALES
, SUM(CASE
WHEN 1=1 AND SLS.SLS_DT = DATE_SUB(SLS.SLS_DT, INTERVAL 364 DAY)
THEN (COALESCE(DTL.UNT_SLS,0)) * COMP.PY_STR_COMP_IND
ELSE NULL END)
AS LY_COMP_UNITS
, SUM(CASE
WHEN SLS.SLS_DT = DATE_SUB(SLS.SLS_DT, INTERVAL 371 DAY)
THEN (COALESCE(DTL.EXT_RETL_AMT,0) +
COALESCE(DTL.TOT_GDISC_DTL_AMT,0))
ELSE NULL END)
AS LY_SALES
, SUM(CASE
WHEN SLS.SLS_DT = DATE_SUB(SLS.SLS_DT, INTERVAL 371 DAY)
THEN (COALESCE(DTL.UNT_SLS,0))
ELSE NULL END)
AS LY_UNITS
FROM `pr-edw-views.SLS.POS_SLS_TRANS_DTL` AS SLS
INNER JOIN
UNNEST (SLS.DTL) AS DTL
JOIN `pr-edw-views.SHARED.MVNDR_HIER` AS MVNDR
ON DTL.DERIV_MVNDR.MVNDR_NBR = MVNDR.MVNDR_NBR
JOIN `pr-edw-views.SHARED.SKU_HIER_FD` AS SKU_HIER
ON SKU_HIER.SKU_NBR = DTL.SKU_NBR
AND SKU_HIER.SKU_CRT_DT = DTL.SKU_CRT_DT
JOIN `pr-edw-views.SHARED.LOC_HIER_FD` AS LOCATION
ON LOCATION.LOC_NBR = SLS.STR_NBR
JOIN `pr-edw-views.SHARED.CAL_PRD_HIER_FD` AS CALENDAR
ON CALENDAR.CAL_DT = SLS_DT
JOIN `pr-edw-views.SLS.STR_COMP_DAY` AS COMP
ON COMP.CAL_DT = CALENDAR.CAL_DT
AND COMP.STR_NBR = SLS.STR_NBR
WHERE CALENDAR.FSCL_WK_END_DT BETWEEN '2018-01-29' AND '2019-04-07'
AND SLS.SLS_DT BETWEEN '2018-01-29' AND '2019-04-07'
AND POS_TRANS_TYP_CD in ('S', 'R')
AND SKU_HIER.EXT_CLASS_NBR = '025-004'
AND MVNDR.MVNDR_NBR IN (74798, 60002238, 73059, 206820, 76009, 40263, 12879, 76722, 10830, 206823, 87752, 60052261, 70401, 51415, 51414)
AND SKU_HIER.LATEST_SKU_CRT_DT_FLG = TRUE
GROUP BY
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20
I'm currently getting null values in my LY_SALES, LY_UNITS, LY_COMP_SALES and LY_COMP_UNITS columns, though I know there should have been locations with sales of those items from the same period the previous year. What I'm trying to get to is having those prior year values showing up along side the current year values. Any help would be hugely appreciated!
Thanks!
Such a condition can never be fulfilled : SLS.SLS_DT = DATE_SUB(SLS.SLS_DT, INTERVAL 371 DAY). Simply because a SLS_DT is not equal to SLS_DT-371.
You can pre-aggregate the table in a CTE (adding SLS_DT to the group by columns) and then replace the CASE with a join to the pre-aggregated table. Aim at something like this: and it will become something like (notice - no SUM in the case):
CASE WHEN AGGSLS.SLS_DT = DATE_SUB(SLS.SLS_DT, INTERVAL 371 DAY)
THEN (COALESCE(AGGSLS.SUM_EXT_RETL_AMT,0) +
COALESCE(AGGSLS.SUM_TOT_GDISC_DTL_AMT,0))
ELSE NULL END
Two things:
1) WHEN 1=1 can be expressed simply as WHEN TRUE, this way it is easier to move statements around without breaking the AND/OR chaining
2) to get the last year's sales. You can either omit the year from the final query and limit the output with a where clause or create a smaller table that has the sales this year, sales last year per week.
In my humble opinion sales last year for weeknum is the best option, as you can use it elsewhere. But it's pretty similar to what you wr
It would look something like:
SELECT CALENDAR.FSCL_WK_DESC as week_num,
sum(case when year = year(current_date()) then (COALESCE(DTL.UNT_SLS,0)) * COMP.STR_COMP_IND else 0 end) as this_year
sum(case when year = year(current_date())-1 then (COALESCE(DTL.UNT_SLS,0)) * COMP.STR_COMP_IND else 0 end) as last_year
And then you join back to the original table using week_num
Hope you find it useful
Cheers!