SQL Window Max() function having issues in code - sql

I am writing a window function that is supposed create a month window and only grab records that has the max value in the update flag field within that said month window.
I am having issues with my window function its still showing all results in the window when it should be showing the only the max value.
I have left my code below. Please help.
SELECT
gb1.SKU_Id,
gb1.Warehouse_Code,
gb1.Period_Start,
gb1.country,
tm.c445_month,
tm.report_date,
gb1.update_flag,
max(gb1.update_flag) over (partition by tm.yearmonth order by gb1.update_flag range between unbounded preceding and current row ) as update_window,
SUM(gb1.TOTAL_NEW_SALES_FORECAST) AS dc_forecast
FROM BAS_E2E_OUTPUT_GLOBAL_FCST gb1
inner join (
SELECT
gb2.SKU_Id,
gb2.Warehouse_Code,
gb2.Period_Start,
gb2.country,
gb2.update_flag,
gb2.report_date,
tm1.week_date,
tm1.c445_month,
tm1.yearmonth
FROM BAS_E2E_OUTPUT_GLOBAL_FCST as gb2
left join (
select distinct(week_date) as week_date,
c445_month,
yearmonth
from "PROD"."INV_PROD"."BAS_445_MONTH_ALIGNMENT"
group by c445_month, week_date, yearmonth
) as tm1 on gb2.report_date = tm1.week_date
group by SKU_Id,
Warehouse_Code,
Period_Start,
country,
update_flag,
report_date,
tm1.week_date,
tm1.c445_month,
tm1.yearmonth
) as tm
on gb1.report_date = tm.week_date
and gb1.SKU_ID = tm.sku_id
and gb1.Warehouse_Code = tm.warehouse_code
and gb1.Period_Start = tm.period_start
and gb1.country = tm.country
GROUP BY
gb1.SKU_Id,
gb1.Warehouse_Code,
gb1.Period_Start,
gb1.country,
tm.c445_month,
tm.yearmonth,
tm.report_date,
gb1.update_flag

You are currently using MAX with the window being defined as every preceding row up and including the current one. Hence, rightfully the max value it returns should probably change for each record. Perhaps you wanted to take the max over a fixed partition:
MAX(gb1.update_flag) OVER (PARTITION BY tm.yearmonth) AS update_window
By the way, if you did really intend to use MAX with your current window logic, on most versions of SQL the ORDER BY clause can be simplified to:
MAX(gb1.update_flag) OVER (PARTITION BY tm.yearmonth ORDER BY gb1.update_flag) AS update_window
That is, the default range is unbounded preceding to the current row, so it is not necessary to say this.

Related

SQL Server: Generate squashed range data from daily dates by an id

Basically to sum up the goal. I want to generate something using the following data. Where it subtotals like excel where "each change in ... ordered however" generate summary records and only generate summary records and squash date ranges.
All I want are the count records would generate rows and would copy the fee data except for the begin/end fields would be the ranged dates. So first record would be fee 9858 from 3/31 - 4/14 for example. Each count row would be a new fee record.
I am not sure which combination of grouping, partition by.... and such to get what I need or if there is something else I can use. I can provide sql if needed but I am mainly looking for finding the right combination of tools(partition by, grouping, rollup....) that would provide this functionality.
WITH [ag]
AS (SELECT *,
LAG([Fee_ID]) OVER (ORDER BY [FeeTypeID], [FeeBeginDate]) [FirstFee],
LAG([Fee_ID]) OVER (ORDER BY [FeeTypeID], [FeeBeginDate] DESC) [LastFee]
FROM [dbo].[HHTFees]
WHERE [Retailer] = 517),
[agf]
AS (SELECT *,
'Beg' [FeeStopType]
FROM [ag]
WHERE [ag].[FirstFee] <> [ag].[Fee_ID]
OR [ag].[FirstFee] IS NULL),
[agl]
AS (SELECT *,
'End' [FeeStopType]
FROM [ag]
WHERE [ag].[LastFee] <> [ag].[Fee_ID]
OR [ag].[LastFee] IS NULL),
[results]
AS (SELECT *
FROM [agf]
UNION
SELECT *
FROM [agl]),
[indexed]
AS (SELECT *,
ROW_NUMBER() OVER (ORDER BY [results].[FeeTypeID], [results].[FeeBeginDate]) [RowNum]
FROM [results])
SELECT [Starts].[Retailer],
[Starts].[Chain_key],
[Starts].[Fee_ID],
[Starts].[FeeTypeID],
[Starts].[FeeChgTypeID],
[Starts].[Fee],
[Starts].[FeeDescription],
[Starts].[FeeTypeDescription],
[Starts].[FeeBeginDate],
[Ends].[FeeEndDate],
[Starts].[FeeAmt],
[Starts].[HHT_ID],
[Starts].[CreatedBy],
[Starts].[CreatedDate],
[Starts].[ModifiedBy],
[Starts].[ModifiedDate],
[Starts].[MsgID],
[Starts].[ScopeOrder],
[Starts].[Scope],
[Starts].[FirstFee],
[Starts].[LastFee],
[Starts].[FeeStopType],
[Starts].[RowNum]
FROM [indexed] [Starts]
INNER JOIN [indexed] [Ends]
ON [Starts].[Fee_ID] = [Ends].[Fee_ID]
AND [Ends].[RowNum] = [Starts].[RowNum] + 1;
I used lag to find where the fee id changed sorted by feetype / start date. This allowed me to mimic when fee id changed sorted by begin date like excel subtotal. Now to optimize and tweak to fit the set of data.

SQL BigQuery - Error that variable is not grouped by even though it is

SQL Code:
SELECT community_table.community_name,
community_table.id,
DATE(timestamp) as date,
ifnull(COUNT(distinct app_opened.user_id), 0) as num_opened_DAU,
lag(COUNT(distinct app_opened.user_id)) OVER
(ORDER BY community_table.community_name, community_table.id, DATE(timestamp)) as pre_Value
FROM *** app_opened
LEFT JOIN (
SELECT DISTINCT id, community_id_2, context_traits_first_name, context_traits_last_name
FROM (
SELECT *
FROM ***,
UNNEST (JSON_EXTRACT_ARRAY(context_traits_community_ids, "$")) as community_id_2
)
GROUP by community_id_2, id, context_traits_first_name, context_traits_last_name) as community_id_table
ON community_id_table.id = app_opened.user_id
LEFT JOIN (
SELECT DISTINCT id, name as community_name
FROM ***) as community_table
ON TO_JSON_STRING(community_table.id) = community_id_table.community_id_2
WHERE app_opened.user_id is not null AND
EXTRACT(DAYOFWEEK FROM DATE(timestamp)) = 2 AND
community_table.community_name is not null
GROUP BY community_table.community_name, community_table.id, DATE(timestamp)
Error Message:
I am quite confused on what could be going wrong here, as the error says that timestamp is not grouped, even though I have grouped it at the bottom. I tried including just timestamp rather than Date(timestamp), but that ruins the table data that I am trying to create, where I find the number of users on a single day. Does anyone have any other ideas? My goal is for a single row, get the previous row's data, but because I am grouping by specific metrics, I need to make sure they are ordered by them as well. Thank you so much!
I think you simply need to modify OVER part as:
OVER (PARTITION BY community_table.community_name, community_table.id, DATE(timestamp)) as pre_Value
UPDATE. Seems that the problem was caused by using DATE() function within OVER so it can be solved by using DATE(timestamp) inside of subquery and passing alias to OVER

Displaying a sum incrementally per row using window function in postgresql

I need to enter the total length of verblijfsduur (stay) per reisnr (trip) incrementally according to the order in which the celestial objects are visited.
This what I need: (ignore the tot_duur column, that is the sum of all verblijfsduur)
This what I get, see it dosn't show incrementally according to order, it shows the total perreisnr (trip number):
I don't know how I could indicate this in the PARTITION BY part of my query:
SELECT re.reisnr, be.volgnr, be.objectnaam, be.verblijfsduur
,sum(be.verblijfsduur) OVER (PARTITION BY re.reisnr ORDER BY re.reisnr ) as
inc_duur
FROM reizen re INNER JOIN bezoeken be ON re.reisnr = be.reisnr
ORDER BY re.reisnr, be.volgnr, be.objectnaam, be.verblijfsduur
I found the problem, thanks/sorry if you had an answer, I had to order by be.volgnr
SELECT re.reisnr, be.volgnr, be.objectnaam, be.verblijfsduur
,SUM(be.verblijfsduur) OVER (PARTITION BY re.reisnr ORDER BY be.volgnr) as inc_duur
FROM reizen re INNER JOIN bezoeken be ON re.reisnr = be.reisnr
ORDER BY re.reisnr, be.volgnr, be.objectnaam, be.verblijfsduur

I am stuck on getting a previous value

I have been working on this SQL code for a bit and I cannot get it to display like I want. I have an operation that we send parts outside of our business but there is no time stamp on when that operation sent out.
I am taking the previous operation's last labor date and the purchase order creation date to try and find out how long it takes that department to issued a purchase order.
I have tried LAST_Value to add to my query. I have even played with LAG and couldn't get a anything but errors.
SELECT
JobOpDtl.JobNum,
JobOpDtl.OprSeq,
JobOpDtl.OpDtlDesc,
LastValue.ClockInDate,
LastValue.LastValue
FROM Erp.JobOpDtl
LEFT OUTER JOIN Erp.LaborDtl ON
LaborDtl.JobNum = JobOpDtl.JobNum
and LaborDtl.OprSeq = JobOpDtl.OprSeq
LEFT OUTER JOIN (
Select
LaborDtl.JobNum,
LaborDtl.OprSeq,
MAX(LaborDtl.ClockInDate) as ClockInDate,
LAST_VALUE (LaborDtl.ClockInDate) OVER (PARTITION BY OprSeq ORDER BY JobNum) as LastValue
FROM Erp.LaborDtl
GROUP BY
LaborDtl.JobNum,
LaborDtl.OprSeq,
LaborDtl.ClockInDate
) as LastValue ON
JobOpDtl.JobNum = LastValue.JobNum
and JobOpDtl.OprSeq = LastValue.OprSeq
WHERE JobOpDtl.JobNum = 'PA8906'
GROUP BY
JobOpDtl.JobNum,
LastValue.OprSeq,
JobOpDtl.OpDtlDesc,
JobOpDtl.OprSeq,
LastValue.ClockInDate,
LastValue.LastValue
No errors, just not displaying how I am wanting it.
I would like it to display the OperSeq with the previous OperSeq last transaction date.
The basic function you want is LAG (as you suggested) but you need to wrap it in a COALESCE. Here is a sample code that illustrates the concept
SELECT * INTO #Jobs
FROM (VALUES ('P1','Step1', '2019-04-01'), ('P1','Step2', '2019-04-02')
, ('P1','Step3', '2019-04-03'), ('P1','Step4', NULL),
('P2','Step1', '2019-04-01'), ('P2','Step2', '2019-04-03')
, ('P2','Step3', '2019-04-06'), ('P2','Step4', NULL)
) as JobDet(JobNum, Descript, LastDate)
SELECT *
, COALESCE( LastDate, LAG(LastDate,1)
OVER(PARTITION BY JobNum
ORDER BY COALESCE(LastDate,GETDATE()))) as LastValue
FROM #Jobs
ORDER BY JobNum, Descript
DROP TABLE #Jobs
To apply it to your specific problem, I'd suggest using a COMMON TABLE EXPRESSION that replaces LastValue and using that instead of the raw table for your queries.
Your example picture doesn't match any tables you reference in your code (it would help us significantly if you included code that created temp tables matching those referenced in your code) so this is a guess, but it will be something like this:
;WITH cteJob as (
SELECT JobNum, OprSeq, OpDtlDesc, ClockInDate
, COALESCE( LastValue, LAG(LastValue,1)
OVER(PARTITION BY JobNum
ORDER BY COALESCE(LastValue,GETDATE()))) as LastValue
FROM Erp.JobOptDtl
) SELECT *
FROM cteJob as J
LEFT OUTER JOIN LaborDtl as L
on J.JobNum = JobNum
AND J.OprSeq = L.OprSeq
BTW, if you clean up your question to provide a better example of your data (i.e. SELECT INTO sttements like in the start of my answer that produce tables that correspond to the tables in your code instead of an image of an excel file) I might be able to get you closer to what you need, but hopefully this is enough to get you on the right track and it's the best I can do with what you've provided so far.

Collapse Data in Sql without stored precedure or function if a value is the same as the value from row above

I got a problem regarding grouping if a value is the same as in the row above.
Our statement looks like this:
SELECT pat_id,
treatData.treatmentdate AS Date,
treatMeth.name AS TreatDataTableInfo,
treatData.treatmentid AS TreatID
FROM dialysistreatmentdata treatData
LEFT JOIN hdtreatmentmethods treatMeth
ON treatMeth.id = treatData.hdtreatmentmethodid
WHERE treatData.hdtreatmentmethodid IS NOT NULL
AND Year(treatData.treatmentdate) >= 2013
AND ekeyid = 12
ORDER BY treatData.ekeyid,
treatmentdate DESC,
treatdatatableinfo;
The output looks like this:
The desired output should be grouped if the value is the same as in the row/rows before and ther should be a ToDate as you can see in the screenshot which is the date of the next row -1 day.
The desired output should look like this:
I hope someone has a solution regarding this matter!
Or maybe someone has an idea how to solve this problem within qlikview.
Looking forward for solutions
Michael
You want to collapse episodes of treatment into single rows. This is a "gaps-and-islands" problem. I like the difference of row numbers approach:
select patid, min(date) as fromdate, max(date) as todate, TreatDataTableInfo,
min(treatid)
from (select td.Pat_ID, td.TreatmentDate As Date, tm.Name As TreatDataTableInfo,
td.TreatmentID As TreatID,
row_number() over (partition by td.pat_id order by td.treatmentdate) as seqnum_p,
row_number() over (partition by td.pat_id, tm.name order by td.treatment_date) as seqnum_pn
from DialysisTreatmentData td Left join
HDTreatmentMethods tm
On tm.ID = td.HDTreatmentMethodID
where td.HDTreatmentMethodID Is Not Null And
td.TreatmentDate) >= '2013-01-01' and
EKeyID = 12
) t
group by patid, TreatDataTableInfo, (seqnum_p - seqnum_pn)
order by patid, TreatmentDate Desc, TreatDataTableInfo;
Note: This uses the ANSI standard window function row_number(), which is available in most databases.
Below is a possible Qlikview solution. I've put some comments in the script. If it's not clear just let me know. The result picture is below the script.
RawData:
Load * Inline [
Pat_ID,Date,TreatDataTableInfo,TreatId
PatNum_12,08.07.2016,HDF Pradilution,1
PatNum_12,07.07.2016,HDF Predilution,2
PatNum_12,23.03.2016,HD,3
PatNum_12,24.11.2015,HD,4
PatNum_12,22.11.2015,HD,5
PatNum_12,04.09.2015,HD,6
PatNum_12,01.09.2015,HD,7
PatNum_12,30.07.2015,HD,8
PatNum_12,12.01.2015,HD,9
PatNum_12,09.01.2015,HD,10
PatNum_12,26.08.2014,Hemodialysis,11
PatNum_12,08.07.2014,Hemodialysis,12
PatNum_12,23.05.2014,Hemodialysis,13
PatNum_12,19.03.2014,Hemodialysis,14
PatNum_12,29.01.2014,Hemodialysis,15
PatNum_12,14.12.2013,Hemodialysis,16
PatNum_12,26.10.2013,Hemodialysis,17
PatNum_12,05.10.2013,Hemodialysis,18
PatNum_12,03.10.2013,HD,19
PatNum_12,24.06.2013,Hemodialysis,20
PatNum_12,03.06.2013,Hemodialysis,21
PatNum_12,14.05.2013,Hemodialysis,22
PatNum_12,26.02.2013,HDF Postdilution,23
PatNum_12,23.02.2013,HDF Pradilution,24
PatNum_12,21.02.2013,HDF Postdilution,25
PatNum_12,07.02.2013,HD,26
PatNum_12,25.01.2013,HDF Pradilution,27
PatNum_12,18.01.2013,HDF Pradilution,28
];
GroupedData:
Load
*,
// assign new GroupId for all rows where the TreatDataTableInfo is equal
if( RowNo() = 1, 1,
if( TreatDataTableInfo <> peek('TreatDataTableInfo'),
peek('GroupId') + 1, peek('GroupId'))) as GroupId,
// assign new GroupSubId (incremental int) for all the records in each group
if( TreatDataTableInfo <> peek('TreatDataTableInfo'),
1, peek('GroupSubId') + 1) as GroupSubId,
// pick the first Date field value and spread it acccross the group
if( TreatDataTableInfo <> peek('TreatDataTableInfo'), TreatId, peek('TreatId_Temp')) as TreatId_Temp
Resident
RawData
;
Drop Table RawData;
right join (GroupedData)
// get the max GroupSubId for each group and right join it to
// the GroupedData table to remove the records we dont need
MaxByGroup:
Load
max(GroupSubId) as GroupSubId,
GroupId
Resident
GroupedData
Group By
GroupId
;
// these are not needed anymore
Drop Fields GroupId, GroupSubId, TreatId;
// replace the old TreatId with the new TreatId_Temp field
// which contains the first TreatId for each group
Rename Field TreatId_Temp to TreatId;