Using Over(Partition By) when calculating median moving average unit cost - sql

Good morning,
I am trying to calculate a 12 month moving average cost (MAUC) for each item in a particular warehouse. I am using the 2012_B – paging trick to calculate the median price (http://sqlperformance.com/2012/08/t-sql-queries/median) instead of using AVG in order to remove the potential for outliers to skew the result.
The following code works, however it only calculates the MAUC for one item or all items - depending on whether I remove or retain "AND t_item = 'xxxxx'
WITH Emily AS
(SELECT
t_item AS [Item Code]
,t_mauc_1 AS [MAUC]
FROM twhina113100
WHERE t_cwar = '11'
AND t_item = ' TNC-C2050NP-G'
AND t_trdt > GETDATE()-365)
(SELECT
AVG(1.0 * [Valuation Table].[MAUC])
FROM (
SELECT [MAUC] FROM Emily
ORDER BY [Emily].[MAUC]
OFFSET ((SELECT COUNT(*) FROM Emily) - 1) / 2 ROWS
FETCH NEXT 1 + (1 - (SELECT COUNT(*) FROM Emily) % 2) ROWS ONLY
) AS [Valuation Table] )
I believe that using Over(Partition By) may help me to partition by t_item however I am at a loss as to where to insert it into the code. I am quite new to SQL and my lack of formal training is starting to show.
If you have any other suggestions please share.
Any help would be much appreciated!

This one caught my attention, so I'm posting two options:
The first is a straight cte approach, and the second uses temp tables. The cte approach is fine for smaller data sets, but performance suffers as the series expands.
Both options will calculate the RUNNING Min, Max, Mean, Median, and Mode for a data series
Just a couple of items before we get into it. The normalized structure is ID and Measure.
- The ID could be a date or identity.
- The Measure is any numeric value
- Median is the mid-value of the sorted series. If an even number of observations we return the average of the two middle records
- Mode is represented as ModeR1 and ModeR2. If no repeated values, we show the min/max range
OK, let's take a look at the cte Approach
Declare #Table table (ID Int,Measure decimal(9,2))
Insert into #Table (ID,Measure) values
(1,25),
(2,75),
(3,50),
(4,25),
(5,12),
(6,66),
(7,45)
;with cteBase as (Select *,RowNr = Row_Number() over (Order By ID) From #Table),
cteExpd as (Select A.*,Measure2 = B.Measure,ExtRowNr = Row_Number() over (Partition By A.ID Order By B.Measure) From cteBase A Join cteBase B on (B.RowNr<=A.RowNr)),
cteMean as (Select ID,Mean=Avg(Measure2),Rows=Count(*) From cteExpd Group By ID),
cteMedn as (Select ID,MedRow1=ceiling(Rows/2.0),MedRow2=ceiling((Rows+1)/2.0) From cteMean),
cteMode as (Select ID,Mode=Measure2,ModeHits=count(*),ModeRowNr=Row_Number() over (Partition By ID Order By Count(*) Desc) From cteExpd Group By ID,Measure2)
Select A.ID
,A.Measure
,MinVal = min(Measure2)
,MaxVal = max(Measure2)
,Mean = max(B.Mean)
,Median = isnull(Avg(IIF(ExtRowNr between MedRow1 and MedRow2,Measure2,null)),A.Measure)
,ModeR1 = isnull(max(IIf(ModeHits>1,D.Mode,null)),min(Measure2))
,ModeR2 = isnull(max(IIf(ModeHits>1,D.Mode,null)),max(Measure2))
From cteExpd A
Join cteMean B on (A.ID=B.ID)
Join cteMedn C on (A.ID=C.ID)
Join cteMode D on (A.ID=D.ID and ModeRowNr=1)
Group By A.ID
,A.Measure
Order By A.ID
Returns
ID Measure MinVal MaxVal Mean Median ModeR1 ModeR2
1 25.00 25.00 25.00 25.000000 25.000000 25.00 25.00
2 75.00 25.00 75.00 50.000000 50.000000 25.00 75.00
3 50.00 25.00 75.00 50.000000 50.000000 25.00 75.00
4 25.00 25.00 75.00 43.750000 37.500000 25.00 25.00
5 12.00 12.00 75.00 37.400000 25.000000 25.00 25.00
6 66.00 12.00 75.00 42.166666 37.500000 25.00 25.00
7 45.00 12.00 75.00 42.571428 45.000000 25.00 25.00
This cte approach is very light and fast for smaller data series
Now the Temp Table Approach
-- Generate Base Data -- Key ID and Key Measure
Select ID =TR_Date
,Measure=TR_Y10,RowNr = Row_Number() over (Order By TR_Date)
Into #Base
From [Chinrus-Series].[dbo].[DS_Treasury_Rates]
Where Year(TR_Date)>=2013
-- Extend Base Data one-to-many
Select A.*,Measure2 = B.Measure,ExtRowNr = Row_Number() over (Partition By A.ID Order By B.Measure) into #Expd From #Base A Join #Base B on (B.RowNr<=A.RowNr)
Create Index idx on #Expd (ID)
-- Generate Mean for Series
Select ID,Mean=Avg(Measure2),Rows=Count(*) into #Mean From #Expd Group By ID
Create Index idx on #Mean (ID)
-- Calculate Median Row Number(s) -- If even(avg of middle two rows)
Select ID,MednRow1=ceiling(Rows/2.0),MednRow2=ceiling((Rows+1)/2.0) into #Medn From #Mean
Create Index idx on #Medn (ID)
-- Calculate Mode
Select * into #Mode from (Select ID,Mode=Measure2,ModeHits=count(*),ModeRowNr=Row_Number() over (Partition By ID Order By Count(*) Desc,Measure2 Desc) From #Expd Group By ID,Measure2) A where ModeRowNr=1
Create Index idx on #Mode (ID)
-- Generate Final Results
Select A.ID
,A.Measure
,MinVal = min(Measure2)
,MaxVal = max(Measure2)
,Mean = max(B.Mean)
,Median = isnull(Avg(IIF(ExtRowNr between MednRow1 and MednRow2,Measure2,null)),A.Measure)
,ModeR1 = isnull(max(IIf(ModeHits>1,D.Mode,null)),min(Measure2))
,ModeR2 = isnull(max(IIf(ModeHits>1,D.Mode,null)),max(Measure2))
From #Expd A
Join #Mean B on (A.ID=B.ID)
Join #Medn C on (A.ID=C.ID)
Join #Mode D on (A.ID=D.ID and ModeRowNr=1)
Group By A.ID
,A.Measure
Order By A.ID
Returns
ID Measure MinVal MaxVal Mean Median ModeR1 ModeR2
2013-01-02 1.86 1.86 1.86 1.86 1.86 1.86 1.86
2013-01-03 1.92 1.86 1.92 1.89 1.89 1.86 1.92
2013-01-04 1.93 1.86 1.93 1.9033 1.92 1.86 1.93
2013-01-07 1.92 1.86 1.93 1.9075 1.92 1.92 1.92
2013-01-08 1.89 1.86 1.93 1.904 1.92 1.92 1.92
...
2016-07-20 1.59 1.37 3.04 2.2578 2.24 2.20 2.20
2016-07-21 1.57 1.37 3.04 2.257 2.235 2.61 2.61
2016-07-22 1.57 1.37 3.04 2.2562 2.23 2.20 2.20
Both approaches where validated in Excel
I should add that in the final query, you could certainly add/remove items like STD, Total

Related

IF Else or Case Function for SQL select problem

Hi I would like to make a select expression using case or if/else which seems to be a simple solution from logic perspective but I can't seem to get it to work. Basically I am joining against two table here, the first table is customer record with date filter called min_del_date and then the second table for the model scoring table with BIN and update_date parameters.
There are two logics I want to display
Picking the model score that was the month before min_del_date
If model score month before delivery is greater than 50 (Bin > 50) then pick the model score for same month as min_del_date
My 1st logic code is below
with cust as (
select
distinct cust_no, max(del_date) as del_date, min(del_date) as min_del_date, (EXTRACT(YEAR FROM min(del_date)) -1900)*12 + EXTRACT(MONTH FROM min(del_date)) AS upd_seq
from customer.cust_history
group by 1
)
,model as (
select party_id, model_id, update_date, upd_seq, bin, var_data8, var_data2
from
(
select
party_id, update_date, bin, var_data8, var_data2,
(EXTRACT(YEAR FROM UPDATE_DATE) -1900)*12 + EXTRACT(MONTH FROM UPDATE_DATE) AS upd_seq,
dense_Rank() over (partition by (EXTRACT(YEAR FROM UPDATE_DATE) -1900)*12 + EXTRACT(MONTH FROM UPDATE_DATE) order by update_date desc) as rank1
from
(
select party_id,update_date, bin, var_data8, var_data2
from model.rpm_model
group by party_id,update_date, bin, var_data8, var_data2
) model
)model_final
where rank1 = 1
)
-- Add model scores
-- 1st logic Picking the model score that was the month before delivery date
select *
from
(
select cust.cust_no, cust.del_date, cust.min_del_date, model.upd_seq, model.bin
from cust
left join cust
on cust.cust_no = model.party_id
and cust.upd_seq = model.upd_seq + 1
)a
Now I am struggling in creating the 2nd logic in the same query?.. any assistance would be appreciated
cust table
cust_no
min_del_date
upd_seq
123
2021-01-11
1453
234
2020-06-29
1446
456
2020-07-20
1447
model table
party_id
update_date
upd_seq
BIN
123
2020-11-30
1451
22
123
2020-12-25
1452
54
123
2020-01-11
1453
14
234
2020-05-23
1445
76
234
2020-06-18
1446
48
234
2020-07-23
1447
12
456
2020-06-18
1446
23
456
2020-07-23
1447
39
456
2020-08-21
1448
21
desired results
cust_no
min_del_date
model.upd_seq
update_date
BIN
123
2021-01-11
1453
2020-01-11
14
234
2020-06-29
1446
2020-06-18
48
456
2020-07-20
1446
2020-06-18
23
Update
I managed to find the solution by myself, thanks for everyone who has attending this question. The solution is per below
select a.cust_no, a.del_date, a.min_del_date, b.update_date, b.upd_seq, b.bin
from
(
select cust.cust_no, cust.del_date, cust.min_del_date,
CASE WHEN model.BIN <=50 THEN model.upd_seq WHEN BIN > 50 THEN model.upd_seq +1 ELSE NULL END as upd_seq
from cust
inner join model
on cust.cust_no = model.party_id
and cust.upd_seq = model.upd_seq + 1
)a
inner join model b
on a.cust_no = b.party_id
and a.upd_seq = b.upd_seq

T-SQL Override special rates and generate final date range

I have transaction table which has date range and basic rate for the range. I have another table for special rate which has date range for special rate and its rate. I would like to split my original transaction in multiple records if special rates falls in transaction date range.
Just for simplicity I have created two tables with limited columns
DECLARE #ClientTrx AS TABLE (ClientId int, StartDate Date, EndDate Date, Rate decimal(10,2))
DECLARE #SpecialRate AS TABLE (ClientId int, StartDate Date, EndDate Date, Rate decimal(10,2))
insert into #ClientTrx select 1, '1/1/2020', '1/15/2020', 10
insert into #ClientTrx select 1, '1/16/2020', '1/31/2020', 10
insert into #ClientTrx select 2, '1/1/2020', '1/15/2020', 20
insert into #ClientTrx select 2, '1/16/2020', '1/31/2020', 20
insert into #ClientTrx select 2, '2/1/2020', '2/13/2020', 20
insert into #SpecialRate select 1, '12/25/2019', '1/3/2020', 13
insert into #SpecialRate select 1, '1/4/2020', '1/6/2020', 15
insert into #SpecialRate select 1, '1/11/2020', '1/18/2020', 12
insert into #SpecialRate select 2, '1/25/2020', '1/31/2020', 23
insert into #SpecialRate select 2, '2/4/2020', '2/8/2020', 25
insert into #SpecialRate select 2, '2/11/2020', '2/29/2020', 22
I need help write a query which produce following results:
ClientId StartDate EndDate Rate
1 2020-01-01 2020-01-03 13.00 special rate
1 2020-01-04 2020-01-06 15.00 special rate
1 2020-01-07 2020-01-10 10.00 regular rate
1 2020-01-11 2020-01-15 12.00 special rate
1 2020-01-16 2020-01-18 12.00 special rate splitting pay period
1 2020-01-19 2020-01-31 10.00 regular rate
2 2020-01-01 2020-01-15 20.00 regular rate
2 2020-01-16 2020-01-24 20.00 regular rate
2 2020-01-25 2020-01-31 23.00 special rate
2 2020-02-01 2020-02-03 20.00 regular rate
2 2020-02-04 2020-02-08 25.00 special rate
2 2020-02-09 2020-02-10 20.00 regular rate
2 2020-02-11 2020-02-13 22.00 special rate
I think using CTE its possible but I can't figure it out. can anyone please help?
Note: I have made some changes in my input and expected output, i think I need one more group level, can you please help?
This is an approach which uses and ad-hoc tally table to expand the datasets and then applies a Gaps-and-Islands for the final summary
Example
;with cte as (
Select A.ClientId
,D
,Rate = coalesce(NewRate,A.Rate)
,Grp = datediff(day,'1900-01-01',D) - row_number() over (partition by ClientID,coalesce(NewRate,A.Rate) Order by D)
From #ClientTrx A
Cross Apply (
Select Top (DateDiff(DAY,StartDate,EndDate)+1) D=DateAdd(DAY,-1+Row_Number() Over (Order By (Select Null)),StartDate)
From master..spt_values n1,master..spt_values n2
) B
Outer Apply (
Select NewRate=Rate
From #SpecialRate
Where D between StartDate and EndDate
and ClientId=A.ClientID
) C
)
Select ClientID
,StartDate= min(D)
,EndDate = max(D)
,Rate = Rate
From cte
Group By ClientID,Grp,Rate
Order by ClientID,min(D)
Returns
ClientID StartDate EndDate Rate
1 2020-01-01 2020-01-03 13.00
1 2020-01-04 2020-01-06 15.00
1 2020-01-07 2020-01-10 10.00
1 2020-01-11 2020-01-18 12.00
1 2020-01-19 2020-01-31 10.00
2 2020-01-01 2020-01-24 20.00
2 2020-01-25 2020-01-31 23.00
2 2020-02-01 2020-02-03 20.00
2 2020-02-04 2020-02-08 25.00
2 2020-02-09 2020-02-10 20.00
2 2020-02-11 2020-02-15 22.00
Notes:
Cross Apply B generates a record for each date between startDate and endDate in #ClientTrx.
Outer Apply C attempts to find the Exception or NewRate
the CTE generates one record per date and toggles the default or exception rate. It looks like this
Notice how GRP changes. This is a simple technique to "feed" the Gaps-and-Islands
Then is becomes a small matter to group the results from cte by ClientID and Grp

Add remaining value to next rows in sql server

I have table, as below and its contains customer electricity volume for the period as.Available data like
OwnerID StartDate EndDate Volume
1 2019-01-01 2019-01-15 10.40
1 2019-01-16 2019-01-31 5.80
1 2019-02-01 2019-02-10 7.90
1 2019-02-11 2019-02-28 8.50
2 2019-03-01 2019-03-04 10.50
And another table having their existing remaining volume. Both table are connected with Column OwnerID
OwnerID ExistingVolume
1 0.90
2 0.60
Now add (apply) the ExistingVolume with current Volume (first table) as
Calculate the new volume as whole numer and remaining decimal value add to next period to the customer.
So expected result set should like,
OwnerId StartDate EndDate CalulatedVolume RemainingExistingVolume
1 2019-01-01 2019-01-15 11 0.30
1 2019-01-16 2019-01-31 6 0.10
1 2019-02-01 2019-02-10 8 0.00
1 2019-02-11 2019-02-28 8 0.50
2 2019-03-01 2019-03-04 11 0.10
Don't round off the CalulatedVolume. Just get the whole when add the table1.Volume + table2.ExistingVolume.
And Remaining decimal value (from 1st row) should be applied the next row value table1.Volume
Could you someone suggest how to achieve this is in SQL query?
If I understand correctly, you want to accumulative the "error" from rounding and apply that against the value in the second table.
You can use a cumulative sum for this purpose -- along with some arithmetic:
select t1.ownerid, t1.startdate, t1.enddate,
round(t1.volume, 0) as calculatedvolume,
( sum( t1.volume - round(t1.volume, 0) ) over (partition by t1.ownerid order by t1.startdate) +
t2.existingvolume
) as remainingexisting
from table1 t1 left join
table2 t2
on t1.ownerid = t2.ownerid;
You have a non-standard definition of rounding. This can be implemented as ceil(x - 0.5). With this definition, the code is:
select t1.ownerid, t1.startdate, t1.enddate,
ceiling(t1.volume - 0.5) as calculatedvolume,
( sum( t1.volume - ceiling(t1.volume - 0.5) ) over (partition by t1.ownerid order by t1.startdate) +
t2.existingvolume
) as remainingexisting
from table1 t1 left join
table2 t2
on t1.ownerid = t2.ownerid;
Here is a db<>fiddle.

oracle Filter duplicate column values

Below is the sample data extract i have. And i wanted to delete the duplicate row (last one in this example) as below. I was wondering how can i easily fetch this without that extra record in select query
ID YEAR CNT VOLUME INT_VOLUME RATE INT_RATE GM GM_RCNT
545 2016 12 5508 5508 1604 1604 0.71 NULL
545 2017 5 1138 2731 824 1977 0.28 -50.42
545 2018 NULL NULL -45 2351 NULL NULL NULL
626 2016 12 679862 679862 252693 252693 0.63 NULL
626 2017 12 705365 705365 282498 282498 0.6 3.75
626 2018 12 707472 707472 291762 291762 0.59 0.3
626 2018 NULL NULL 711372 NULL 295186 NULL NULL --Filter such rows in select
You can choose one year for each id using row_number():
select t.*
from (select t.*,
row_number() over (partition by id, year order by id) as seqnum
from t
) t
where seqnum = 1;
This chooses an arbitrary row to keep. You can adjust the order by to refine which row you want to keep. You can order by rowid, but there is no guarantee that it is the "earliest" row. You need a date or sequence column for that purpose.

TVF UDF does not return the same data as SELECT

Calling the UDF like so:
SELECT
product_name,
SUM(quantity) AS SumQty,
SUM(face_value) AS SumFaceValue,
SUM(net_cost)AS SumNetCost,
SUM(face_value - net_cost) AS SumScripRebate,
organization_name
FROM getSalesSummary(#GLSCOrgId, #BeginDate, #EndDate) getSalesSummary
GROUP BY product_name, organization_name
ORDER BY product_name
yields:
"Chili's 1 25.00 22.75 2.25 Sample Organization 1
CVS/pharmacy 1 25.00 23.50 1.50 Sample Organization 1
Macy's 1 100.00 90.00 10.00 Sample Organization 1"
Using the UDF logic and testing the results with SELECT:
SELECT
product_name,
SUM(quantity) AS SumQty,
SUM(face_value) AS SumFaceValue,
SUM(net_cost) AS SumNetCost,
SUM(face_value - net_cost) AS SumScripRebate,
organization_name
FROM #ReturnTable
GROUP BY product_name, organization_name
ORDER BY product_name
yields:
"Chili's 4 100.00 91.00 9.00 Sample Organization 1
CVS/pharmacy 1 25.00 23.50 1.50 Sample Organization 1
Macy's 1 100.00 90.00 10.00 Sample Organization 1"
#ReturnTable is the table returned by the UDF and is created like so:
INSERT INTO #ReturnTable(product_name,
unit_price,
quantity,
face_value,
net_cost,
organization_name)
(select * from #TablePartial UNION select * from #TableClosed)
The test with the SELECT and variables is returning the correct data, but calling the UDF is not getting those other 3 Chili's records. I am using the same data for parameters. I'm quite new to UDFs and I'm not sure why it would return different data than what the SELECT does. Any suggestions and/or answers?
You probably need UNION ALL not UNION
Looking at the two result sets it adds up as though the 4 Chilli's rows are all the same.
Chili's 1 25.00 22.75 2.25 Sample Organization 1
Chili's 1 25.00 22.75 2.25 Sample Organization 1
Chili's 1 25.00 22.75 2.25 Sample Organization 1
Chili's 1 25.00 22.75 2.25 Sample Organization 1
-------------------------------------------------------------
Chili's 4 100.00 91.00 9.00 Sample Organization 1
Using UNION will remove the duplicates leaving you with one row.
The only thing I can think of is the UNION change it to UNION ALL UNION will eliminate dups
Run these queries to see the difference
select 1 as a
union
select 1
union
select 1
select 1 as a
union all
select 1
union all
select 1