Assistance with PERCENTILE_CONT function and GROUP By error - sql

All,
I am having problems with the below query. I am trying to get stat data from our database for the last 3 years but I keep getting the error message:
***Column 'OC_VDATA.DATA1' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.***
I know it has something to do with the DATA1 column but I am not familiar enough using the PERCENTILE_CONT function to know what the solution is.
Anyone have any ideas?
WITH Q AS
(
SELECT stagingPLM.dbo.ITEM_CODES.ITEM_CODE,
AVG(OC_VDATA.DATA1) AS Mean,
STDEVP(OC_VDATA.DATA1) AS StandardDev,
PERCENTILE_CONT(0.5)
WITHIN GROUP (ORDER BY OC_VDATA.DATA1)
OVER (PARTITION BY stagingPLM.dbo.ITEM_CODES.ITEM_CODE) AS Median
FROM OC_VDATA INNER JOIN
OC_VDAT_AUX ON OC_VDATA.PARTNO = OC_VDAT_AUX.PARTNOAUX
AND OC_VDATA.DATETIME = OC_VDAT_AUX.DATETIMEAUX INNER JOIN
stagingPLM.dbo.ITEM_CODES ON LEFT(OC_VDATA.PARTNO, 12) = stagingPLM.dbo.ITEM_CODES.SPEC_NO
AND LEFT(OC_VDAT_AUX.PARTNOAUX, 12) = stagingPLM.dbo.ITEM_CODES.SPEC_NO
WHERE (OC_VDAT_AUX.UDL28 LIKE '%PLASTIC%')
AND (RIGHT(OC_VDATA.PARTNO, 6) = '036150')
AND (CAST(OC_VDAT_AUX.UDL40 AS DATETIME)
BETWEEN CONVERT(datetime, '2019-05-18 00:00:00', 102) AND CONVERT(datetime, '2022-05-18 00:00:00', 102))
GROUP BY stagingPLM.dbo.ITEM_CODES.ITEM_CODE
)
SELECT * FROM Q

The error is because of the code WITHIN GROUP (ORDER BY OC_VDATA.DATA1).
You are doing GROUP BY(for AVG and STDEVP) based on ITEM_CODE, whereas ORDER BY is there on OC_VDATA.DATA1 for the Window function.
Better to calculate AVG,STDEVP and PERCENTILE_CONT with Window Function, instead of half through GROUP BY and half through Window Function.
By considering the minimum required columns to reproduce the issue, you can rewrite the query as below to get the desired output.
SELECT DISTINCT item_codes.item_code,
Avg(oc_vdata.data1)
over(
PARTITION BY item_codes.item_code) AS Mean,
Stdevp(oc_vdata.data1)
over(
PARTITION BY item_codes.item_code) AS StandardDev,
Percentile_cont(0.5)
within GROUP (ORDER BY oc_vdata.data1) over (
PARTITION BY item_codes.item_code) AS Median
FROM oc_vdata
inner join item_codes
ON Left(oc_vdata.partno, 12) = item_codes.spec_no
DB Fiddle: Try it here
Minimum steps to reproduce the error:
SELECT item_codes.item_code,
Avg(oc_vdata.data1) AS Mean,
Stdevp(oc_vdata.data1) AS StandardDev
FROM oc_vdata
INNER JOIN item_codes
ON LEFT(oc_vdata.partno, 12) = item_codes.spec_no
GROUP BY item_codes.item_code
ORDER BY oc_vdata.data1 -- This will cause the error

Related

SQL aggregated subquery - Athena

Using AWS Athena I want to get total recovered per day by getting total recovered amount / total advances
here is code:
SELECT a.advance_date
,sum(a.advance_amount) as "advance_amount"
,sum(a.advance_fee) as "advance_fee"
,(SELECT
sum(credit_recovered+fee_recovered) / (a.advance_amount+a.advance_fee)
FROM ncmxmy.ageing_recovery_raw_parquet
WHERE advance_date = a.advance_date
AND date(recovery_date) <= DATE_ADD('day', 0, a.advance_date)
) as "day_0"
FROM ageing_summary_advance_parquet a
GROUP BY a.advance_date
ORDER BY a.advance_date
I am getting an error
"("sum"((credit_recovered + fee_recovered)) / (a.advance_amount + a.advance_fee))' must be an aggregate expression or appear in GROUP BY clause"
Your division gives the error because the denominator tries to use individual columns from the ageing_summary_advance_parquet table. In my perception of the query, you need to divide by the grouped sum of advance_amount and advance_fee columns. In that case, we can merge two grouped sets of data by advance_date into the division. Please let me know if this query helps:
WITH cte1 (sum_adv_date, advance_date) as
(SELECT
sum(credit_recovered+fee_recovered) as sum_adv_date, advance_date
FROM ncmxmy.ageing_recovery_raw_parquet
WHERE date(recovery_date) <= DATE_ADD('day', 0, advance_date)
GROUP BY advance_date
),
cte2 (advance_date, advance_amount, advance_fee) as
(SELECT
a.advance_date
,sum(a.advance_amount) as "advance_amount"
,sum(a.advance_fee) as "advance_fee"
FROM ageing_summary_advance_parquet a
GROUP BY a.advance_date
)
SELECT cte2.advance_amount, cte2.advance_fee,
(cte1.sum_adv_date/(cte2.advance_amount+cte2.advance_fee)) as "day_0"
FROM cte1 inner join cte2 on cte1.advance_date = cte2.advance_date
ORDER BY cte1.advance_date

calculate time difference of consecutive row dates in SQL

Hello I am trying to calculate the time difference of 2 consecutive rows for Date (either in hours or Days), as attached in the image
Highlighted in Yellow is the result I want which is basically the difference of the date in that row and 1 above.
How can we achieve it in the SQL? Attached is my complex code which has the rest of the fields in it
with cte
as
(
select m.voucher_no, CONVERT(VARCHAR(30),CONVERT(datetime, f.action_Date, 109),100) as action_date,f.col1_Value,f.col3_value,f.col4_value,f.comments,f.distr_user,f.wf_status,f.action_code,f.wf_user_id
from attdetailmap m
LEFT JOIN awftaskfin f ON f.oid = m.oid and f.client ='PC'
where f.action_Date !='' and action_date between '$?datef' and '$?datet'
),
.*select *, ROW_NUMBER() OVER(PARTITION BY action_Date,distr_user,wf_Status,wf_user_id order by action_Date,distr_user,wf_Status,wf_user_id ) as row_no_1 from cte
cte2 as
(
select *, ROW_NUMBER() OVER(PARTITION BY voucher_no,action_Date,distr_user,wf_Status,wf_user_id order by voucher_no ) as row_no_1 from cte
)
select distinct(v.dim_value) as resid,c.voucher_no,CONVERT(datetime, c.action_Date, 109) as action_Date,c.col4_value,c.comments,c.distr_user,v.description,c.wf_status,c.action_code, c.wf_user_id,v1.description as name,r.rel_value as pay_office,r1.rel_value as site
from cte2 c
LEFT OUTER JOIN aagviuserdetail v ON v.user_id = c.distr_user
LEFT OUTER JOIN aagviuserdetail v1 ON v1.user_id = c.wf_user_id
LEFT OUTER JOIN ahsrelvalue r ON r.resource_id = v.dim_Value and r.rel_Attr_id = 'P1' and r.period_to = '209912'
LEFT OUTER JOIN ahsrelvalue r1 ON r1.resource_id = v.dim_Value and r1.rel_Attr_id = 'Z1' and r1.period_to = '209912'
where c.row_no_1 = '1' and r.rel_value like '$?site1' and voucher_no like '$?trans'
order by voucher_no,action_Date
The key idea is lag(). However, date/time functions vary among databases. So, the idea is:
select t.*,
(date - lag(date) over (partition by transaction_no order by date)) as diff
from t;
I should note that this exact syntax might not work in your database -- because - may not even be defined on date/time values. However, lag() is a standard function and should be available.
For instance, in SQL Server, this would look like:
select t.*,
datediff(second, lag(date) over (partition by transaction_no order by date), date) / (24.0 * 60 * 60) as diff_days
from t;

SQL Calculate 20 business days, remove bank hols and reference to another query

Sorry I am quite new to creating functions in SQL Server 2008 R2. I have largely been able to get by using T-SQL statements.
However I need to create a report that returns records with a date (program start date), that part is simple enough, however for each row date I want to calculate a target completion date based on 20 business days. I also want to avoid counting bank holidays too. I have a table named dCalendar which holds every day for the last few year with flags saying whether each day is a workday or bank holiday.
I have found lots of stuff on how to calculate the number of business days between two dates but this is more tricky.
I have created this function
ALTER function [warehouse].[MS_fnAddBusinessDays]
(#StartDate datetime,
#nDays int)
RETURNS TABLE
AS
RETURN
(SELECT calDt
FROM
(SELECT
ROW_NUMBER() OVER (ORDER BY calDt ASC) AS rownumber,
calDt
FROM
warehouse.dCalendar
WHERE
(calDt >= #StartDate)
AND (weekDayFg = 1)
AND (BankHolidayFg = 0)) AS Results
WHERE
(rownumber = #nDays)
and can call it using the following
SELECT *
FROM
(SELECT
ROW_NUMBER() OVER (ORDER BY calDt ASC) AS rownumber,
calDt, BankHolidayFg, weekDayFg, dateStr
FROM
warehouse.dCalendar
WHERE
(calDt >= CONVERT(DATETIME, '2016-12-11 00:00:00', 102))
AND (weekDayFg = 1) AND (BankHolidayFg = 0)) AS TblResults
WHERE
(rownumber = 20)
I just cannot work out how to embed this within the following example where progStartDate is the date i want to calculate the target date from
SELECT
end_user.fContactVwDn.client_no,
end_user.fProgrammeVwDn.prtyProgrammeType,
end_user.fProgrammeVwDn.progStartDate,
SUM(CASE WHEN comtContactMeetingType = 'Visit' THEN 1 ELSE 0 END) AS InitialVisitTotal,
MAX(CASE WHEN comtContactMeetingType = 'Visit' THEN contPlannedFromDate ELSE 0 END) AS InitialVisitDate
FROM
end_user.fContactVwDn
INNER JOIN
warehouse.fContactProgramme ON end_user.fContactVwDn.contKy = warehouse.fContactProgramme.contKy
INNER JOIN
end_user.fProgrammeVwDn ON warehouse.fContactProgramme.progGuid = end_user.fProgrammeVwDn.progprogGuid
GROUP BY
end_user.fProgrammeVwDn.prtyProgrammeType,
end_user.fProgrammeVwDn.progStartDate, end_user.fContactVwDn.client_no
HAVING
(end_user.fProgrammeVwDn.prtyProgrammeType = 'Application')
AND (end_user.fProgrammeVwDn.progStartDate > CONVERT(DATETIME, '2016-07-01 00:00:00', 102))
Any help would be much appreciated.
The function looks OK. I would add TOP, though, to avoid scanning the whole Calendar table. I hope, calDt is a primary key.
ALTER function [warehouse].[MS_fnAddBusinessDays]
(#StartDate datetime,
#nDays int)
RETURNS TABLE
AS
RETURN
(
SELECT calDt
FROM
(
SELECT TOP(#nDays)
ROW_NUMBER() OVER (ORDER BY calDt ASC) AS rownumber,
calDt
FROM
warehouse.dCalendar
WHERE
(calDt >= #StartDate)
AND (weekDayFg = 1)
AND (BankHolidayFg = 0)
ORDER BY calDt ASC
) AS Results
WHERE
rownumber = #nDays
)
This function is inline table-valued function, which means that it returns a table, not a scalar value. This is good, because scalar functions usually make queries slow, but inline (single-statement) table-valued functions can be inlined by the optimiser.
To call such function use CROSS APPLY. It was originally introduced to SQL Server specifically for calling table-valued functions, but it can do much more (it is so called lateral join).
I wrapped your original query in a CTE to make the final query readable.
WITH
CTE
AS
(
SELECT
end_user.fContactVwDn.client_no,
end_user.fProgrammeVwDn.prtyProgrammeType,
end_user.fProgrammeVwDn.progStartDate,
SUM(CASE WHEN comtContactMeetingType = 'Visit' THEN 1 ELSE 0 END) AS InitialVisitTotal,
MAX(CASE WHEN comtContactMeetingType = 'Visit' THEN contPlannedFromDate ELSE 0 END) AS InitialVisitDate
FROM
end_user.fContactVwDn
INNER JOIN warehouse.fContactProgramme ON end_user.fContactVwDn.contKy = warehouse.fContactProgramme.contKy
INNER JOIN end_user.fProgrammeVwDn ON warehouse.fContactProgramme.progGuid = end_user.fProgrammeVwDn.progprogGuid
GROUP BY
end_user.fProgrammeVwDn.prtyProgrammeType,
end_user.fProgrammeVwDn.progStartDate,
end_user.fContactVwDn.client_no
HAVING
(end_user.fProgrammeVwDn.prtyProgrammeType = 'Application')
AND (end_user.fProgrammeVwDn.progStartDate > CONVERT(DATETIME, '2016-07-01 00:00:00', 102))
)
SELECT
CTE.client_no,
CTE.prtyProgrammeType,
CTE.progStartDate,
CTE.InitialVisitTotal,
CTE.InitialVisitDate,
F.calDt
FROM
CTE
CROSS APPLY [warehouse].[MS_fnAddBusinessDays](CTE.progStartDate, 20) AS F
;

ROW_NUMBER() OVER (PARTITION BY) showing duplicate results for Group By Clause

I have the below query that was created to show the summation of the "Last" values for a year, usually this is a december value, but the year could potentially end in any month so i want to add together the last values for each goalmontecarloheaderid. I have it working 99%, but there are some random duplicates in the [year] value.
WITH endBalances AS (
SELECT ROW_NUMBER() OVER (PARTITION By GoalMonteCarloHeaderID, Year(Convert(date,MonthDate)) Order By Max(Month(Convert(date,MonthDate))) desc) n, Max(Month(Convert(date,MonthDate))) maxMonth, GrowthBucket, WithdrawalBucket, NoTaxesBucket,
Year(MonthDate) [year]
From GoalMonteCarloMedianResults mcmr
full join GoalMonteCarloHeader mch on mch.ID = mcmr.GoalMonteCarloHeaderID
full join GoalChartData gcd on gcd.ID = mch.GoalChartDataID and gcd.TypeID = 2
inner join Goal g on g.iGoalID = gcd.GoalID
where g.iTypeID in (1) and g.iHHID = 850802
group by GoalMonteCarloHeaderID, MonthDate, GrowthBucket, WithdrawalBucket, NoTaxesBucket
)
SELECT [year], Sum(GrowthBucket) GrowthBucket, Sum(WithdrawalBucket) WithdrawalBucket,Sum(NoTaxesBucket) NoTaxesBucket, maxMonth
From endBalances
where [year] is not null and n=1
Group By [year], maxMonth
order by [year] asc
Showing two random duplicates in the database result;
you can see in the image there are two examples where the year is duplicated and displayed for more than just the 'last' month in the year. Am I doing something wrong with the group by or the PARTITION BY() in my query? I am not the most familiar with this functionality of T-SQL.
T-SQL has a lovely function for this which has no direct equivalent in MySQL.
ROW_NUMBER() OVER (PARTITION BY [year] ORDER BY MonthDate DESC) AS rn
Then anything with rn=1 will be the last entry in a year.
The answers to this question have a few ideas:
ROW_NUMBER() in MySQL

Sql Server - Joining subqueries using calculated fields

I am trying to calculate the percentage change in price between days. As the days are not consectutive, I build into the query a calculated field that tells me what relative day it is (day 1, day 2, etc). In order to compare today with yesterday, I offset the calculated day number by 1 in a subquery. what I want to do is to join the inner and outer query on the calculated relative day. The code I came up with is:
SELECT TOP 11
P.Date,
(AVG(P.SettlementPri) - PriceY) / PriceY as PriceChange,
P.Symbol,
(RANK() OVER (ORDER BY P.Date desc)) as dayrank_Today
FROM OTE P
JOIN (SELECT TOP 11
C.Date,
AVG(SettlementPri) as PriceY,
(RANK() OVER (ORDER BY C.Date desc))+1 as dayrank_Yest
FROM OTE C
WHERE C.ComCode = 'C-'
GROUP BY c.Date) C ON dayrank_Today = C.dayrank_Yest
WHERE P.ComCode = 'C-'
GROUP BY P.Symbol, P.Date
If I try and execute the query, I get an erro message indicating dayrank_Today is an invalid column. I have tried renaming it, qualifying it, yell obsenities at it and I get squat. Still an error.
You can't do a select of a calculated column, and then use it in a join. You can use CTEs, which I'm not so familiar with, or you can jsut do table selects like so:
SELECT
P.Date,
(AVG(AvgPrice) - C.PriceY) / C.PriceY as PriceChange,
P.Symbol,
P.dayrank_Today FROM
(SELECT TOP 11
ComCode,
Date,
AVG(SettlementPri) as AvgPrice,
Symbol,
(RANK() OVER (ORDER BY Date desc)) as dayrank_Today
FROM OTE WHERE ComCode = 'C-') P
JOIN (SELECT TOP 11
C.Date,
AVG(SettlementPri) as PriceY,
(RANK() OVER (ORDER BY C.Date desc))+1 as dayrank_Yest
FROM OTE C
WHERE C.ComCode = 'C-'
GROUP BY c.Date) C ON dayrank_Today = C.dayrank_Yest
GROUP BY P.Symbol, P.Date
If possible consider using a CTE as it makes it very easy. Something like this:
With Raw as
(
SELECT TOP 11 C.Date,
Avg(SettlementPri) As PriceY,
Rank() OVER (ORDER BY C.Date desc) as dayrank
FROM OTE C WHERE C.Comcode = 'C-'
Group by C.Date
)
select today.pricey as todayprice ,
yesterday.pricey as yesterdayprice,
(today.pricey - yesterday.pricey)/today.pricey * 100 as percentchange
from Raw today
left outer join Raw yesterday on today.dayrank = yesterday.dayrank + 1
Obviously this doesn;t include the symbol but that can be included pretty easily.
If using 'With' syntax doesn;t suit you can also use calculated fields with Outer Apply http://technet.microsoft.com/en-us/library/ms175156.aspx
Although the CTE will mean that you only need to write your price calculation once which is a lot cleaner
Cheers
I had the same problem and found this thread and found a solution so I thought I'd post it here.
Instead of using the column name as parameter for ON, copy the statement that gave you the colmun name in the first place:
replace:
ON dayrank_Today = C.dayrank_Yest
with:
ON (RANK() OVER (ORDER BY Date desc)) = C.dayrank_Yest
Granted, you're displeasing the Programming Gods by violating DRY, but you could be pragmatic and mention the duplication in the comments, which should appease their wrath to a mild grumbling.