SQL aggregated subquery - Athena - sql

Using AWS Athena I want to get total recovered per day by getting total recovered amount / total advances
here is code:
SELECT a.advance_date
,sum(a.advance_amount) as "advance_amount"
,sum(a.advance_fee) as "advance_fee"
,(SELECT
sum(credit_recovered+fee_recovered) / (a.advance_amount+a.advance_fee)
FROM ncmxmy.ageing_recovery_raw_parquet
WHERE advance_date = a.advance_date
AND date(recovery_date) <= DATE_ADD('day', 0, a.advance_date)
) as "day_0"
FROM ageing_summary_advance_parquet a
GROUP BY a.advance_date
ORDER BY a.advance_date
I am getting an error
"("sum"((credit_recovered + fee_recovered)) / (a.advance_amount + a.advance_fee))' must be an aggregate expression or appear in GROUP BY clause"

Your division gives the error because the denominator tries to use individual columns from the ageing_summary_advance_parquet table. In my perception of the query, you need to divide by the grouped sum of advance_amount and advance_fee columns. In that case, we can merge two grouped sets of data by advance_date into the division. Please let me know if this query helps:
WITH cte1 (sum_adv_date, advance_date) as
(SELECT
sum(credit_recovered+fee_recovered) as sum_adv_date, advance_date
FROM ncmxmy.ageing_recovery_raw_parquet
WHERE date(recovery_date) <= DATE_ADD('day', 0, advance_date)
GROUP BY advance_date
),
cte2 (advance_date, advance_amount, advance_fee) as
(SELECT
a.advance_date
,sum(a.advance_amount) as "advance_amount"
,sum(a.advance_fee) as "advance_fee"
FROM ageing_summary_advance_parquet a
GROUP BY a.advance_date
)
SELECT cte2.advance_amount, cte2.advance_fee,
(cte1.sum_adv_date/(cte2.advance_amount+cte2.advance_fee)) as "day_0"
FROM cte1 inner join cte2 on cte1.advance_date = cte2.advance_date
ORDER BY cte1.advance_date

Related

Assistance with PERCENTILE_CONT function and GROUP By error

All,
I am having problems with the below query. I am trying to get stat data from our database for the last 3 years but I keep getting the error message:
***Column 'OC_VDATA.DATA1' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.***
I know it has something to do with the DATA1 column but I am not familiar enough using the PERCENTILE_CONT function to know what the solution is.
Anyone have any ideas?
WITH Q AS
(
SELECT stagingPLM.dbo.ITEM_CODES.ITEM_CODE,
AVG(OC_VDATA.DATA1) AS Mean,
STDEVP(OC_VDATA.DATA1) AS StandardDev,
PERCENTILE_CONT(0.5)
WITHIN GROUP (ORDER BY OC_VDATA.DATA1)
OVER (PARTITION BY stagingPLM.dbo.ITEM_CODES.ITEM_CODE) AS Median
FROM OC_VDATA INNER JOIN
OC_VDAT_AUX ON OC_VDATA.PARTNO = OC_VDAT_AUX.PARTNOAUX
AND OC_VDATA.DATETIME = OC_VDAT_AUX.DATETIMEAUX INNER JOIN
stagingPLM.dbo.ITEM_CODES ON LEFT(OC_VDATA.PARTNO, 12) = stagingPLM.dbo.ITEM_CODES.SPEC_NO
AND LEFT(OC_VDAT_AUX.PARTNOAUX, 12) = stagingPLM.dbo.ITEM_CODES.SPEC_NO
WHERE (OC_VDAT_AUX.UDL28 LIKE '%PLASTIC%')
AND (RIGHT(OC_VDATA.PARTNO, 6) = '036150')
AND (CAST(OC_VDAT_AUX.UDL40 AS DATETIME)
BETWEEN CONVERT(datetime, '2019-05-18 00:00:00', 102) AND CONVERT(datetime, '2022-05-18 00:00:00', 102))
GROUP BY stagingPLM.dbo.ITEM_CODES.ITEM_CODE
)
SELECT * FROM Q
The error is because of the code WITHIN GROUP (ORDER BY OC_VDATA.DATA1).
You are doing GROUP BY(for AVG and STDEVP) based on ITEM_CODE, whereas ORDER BY is there on OC_VDATA.DATA1 for the Window function.
Better to calculate AVG,STDEVP and PERCENTILE_CONT with Window Function, instead of half through GROUP BY and half through Window Function.
By considering the minimum required columns to reproduce the issue, you can rewrite the query as below to get the desired output.
SELECT DISTINCT item_codes.item_code,
Avg(oc_vdata.data1)
over(
PARTITION BY item_codes.item_code) AS Mean,
Stdevp(oc_vdata.data1)
over(
PARTITION BY item_codes.item_code) AS StandardDev,
Percentile_cont(0.5)
within GROUP (ORDER BY oc_vdata.data1) over (
PARTITION BY item_codes.item_code) AS Median
FROM oc_vdata
inner join item_codes
ON Left(oc_vdata.partno, 12) = item_codes.spec_no
DB Fiddle: Try it here
Minimum steps to reproduce the error:
SELECT item_codes.item_code,
Avg(oc_vdata.data1) AS Mean,
Stdevp(oc_vdata.data1) AS StandardDev
FROM oc_vdata
INNER JOIN item_codes
ON LEFT(oc_vdata.partno, 12) = item_codes.spec_no
GROUP BY item_codes.item_code
ORDER BY oc_vdata.data1 -- This will cause the error

Not able to run a simple beam pipeline

I have a simple sql query with some aggregations, there is no problem with the query itself, I am looking into its execution plan and don't know where are those aggregations in the plan come from the query itself:
Table:
Query (this query contains string operation, group by, order by and join, purpose: to get the reporting period that total amount increased certain target over the years):
WITH cte
AS (SELECT Year(orderdate) AS yr,
Month(orderdate) AS mon,
Ltrim(Rtrim(Str(Year(orderdate)))) + '-'
+ Ltrim(Rtrim(Str(Month(orderdate)))) AS theMonth,
Sum(totalamount) AS theAmount
FROM [order]
GROUP BY Year(orderdate),
Month(orderdate),
Ltrim(Rtrim(Str(Year(orderdate)))) + '-'
+ Ltrim(Rtrim(Str(Month(orderdate)))))
SELECT TOP 3 cte.themonth,
cte_prev.themonth AS thePrevMonth,
cte.theamount,
cte_prev.theamount AS thePrevAmount,
( cte.theamount - cte_prev.theamount ) AS diff
FROM cte
JOIN cte cte_prev
ON cte.yr = cte_prev.yr + 1
AND cte.mon = cte_prev.mon
WHERE ( cte.theamount - cte_prev.theamount ) / cte_prev.theamount > 0.8
ORDER BY ( cte.theamount - cte_prev.theamount ) / cte_prev.theamount DESC
Execution plan:
I wonder how can I create a better/simpler query to calculate the difference between two reporting period? and the string trimming is really annoying here: why there is no simple and single trim but have to ltrim and rtrim?

Teradata spool space issue on running a sub query with Count

I am using below query to calculate business days between two dates for all the order numbers. Business days are already available in the teradata table Common_WorkingCalendar. But, i'm also facing spool space issue while i execute the query. I have ample space available in my data lab. Need to optimize the query. Appreciate any inputs.
SELECT
tx."OrderNumber",
(SELECT COUNT(1) FROM Common_WorkingCalendar
WHERE CalDate between Cast(tx."TimeStamp" as date) and Cast(mf.ShipDate as date)) as BusDays
from StoreFulfillment ff
inner join StoreTransmission tx
on tx.OrderNumber = ff.OrderNumber
inner join StoreMerchandiseFulfillment mf
on mf.OrderNumber = ff.OrderNumber
This is a very inefficient way to get this count which results in a product join.
The recommended approach is adding a sequential number to your calendar which increases only on business days (calculated using SUM(CASE WHEN businessDay THEN 1 ELSE 0 END) OVER (ORDER BY CalDate ROWS UNBOUNDED PRECEDING)), then it's two joins, for the start date and the end date.
If this calculation is needed a lot you better add a new column, otherwise you can do it on the fly:
WITH cte AS
(
SELECT CalDate,
-- as this table only contains business days you can use this instead
row_number(*) Over (ORDER BY CalDate) AS DayNo
FROM Common_WorkingCalendar
)
SELECT
tx."OrderNumber",
to_dt.DayNo - from_dt.DayNo AS BusDays
FROM StoreFulfillment ff
INNER JOIN StoreTransmission tx
ON tx.OrderNumber = ff.OrderNumber
INNER JOIN StoreMerchandiseFulfillment mf
ON mf.OrderNumber = ff.OrderNumber
JOIN cte AS from_dt
ON from_dt.CalDate = Cast(tx."TimeStamp" AS DATE)
JOIN cte AS to_dt
ON to_dt.CalDate = Cast(mf.ShipDate AS DATE)

SQL Sum() returning postive and negative values

I'm trying to get SUM() to return the sum of a column summing the positive and negative values in the column. Instead its currently returning one positive value and one negative value, can anyone help?
SELECT
LedgerAP.Period, LedgerAP.Account, SUM(LedgerAP.Amount) Amount
FROM
LedgerAP
WHERE
LedgerAP.Period >= 201500 AND LedgerAP.Account = N'105.71'
GROUP BY LedgerAP.Period, LedgerAP.Account
HAVING SUM(Amount) <> 0
UNION ALL
SELECT
LedgerAR.Period, LedgerAR.Account, SUM(LedgerAR.Amount)
FROM
LedgerAR
WHERE
LedgerAR.Period >= 201500 AND LedgerAR.Account = N'105.71'
GROUP BY LedgerAR.Period, LedgerAR.Account
UNION ALL
SELECT
LedgerEx.Period, LedgerEx.Account, SUM(LedgerEx.Amount)
FROM
LedgerEx
WHERE
LedgerEx.Period >= 201500 AND LedgerEx.Account = N'105.71'
GROUP BY LedgerEx.Period, LedgerEx.Account
UNION ALL
SELECT
LedgerMisc.Period, LedgerMisc.Account, SUM(LedgerMisc.Amount)
FROM
LedgerMisc
WHERE
LedgerMisc.Period >= 201500 AND LedgerMisc.Account = N'105.71'
GROUP BY LedgerMisc.Period, LedgerMisc.Account
I think you need to re-aggregate your results:
with l as (
<your query here>
)
select period, account, sum(amount)
from l
group by period, account;
You can do the same thing with a subquery instead of a CTE.

Oracle sub error on query

Following code I added to the SQL Server query and now have to do the same in Oracle. I need to do grouping in the view rather than in the C#. I get this error message:
ORA-01747 Invalid user.table.column or column specification.
How must I code this to work in Oracle?
SELECT CTE.FACILITY_KEY, CTE.DATE, CTE.PATIENT_STATUS, COUNT(*) AS [COUNT]
FROM CTE
GROUP BY CTE.FACILITY_KEY, CTE.DATE, CTE.PATIENT_STATUS;
at the beginning of query I have this full code here:
CREATE OR REPLACE VIEW DBD_V_CDL_CHANGES AS
WITH CTE AS
(
SELECT TR.FACILITY_KEY
, MV.VALUE_CODE
, CAST(COUNT(*) AS NUMERIC(9, 0)) COUNT
FROM OPTC.THS_T_TRANSACTIONS1 TR
JOIN OPTC.THS_M_MENU2 M
ON M.MENU_ID = TR.MENU_ID
JOIN OPTC.THS_M_VALUES MV
ON MV.MENU_ID = TR.MENU_ID_VALUE
JOIN OPTC.THS_M_VALUES MV2
ON MV2.MENU_ID = TR.PREVIOUS_MENU_ID_VALUE
JOIN OGEN.GEN_M_PATIENT_MAST PM
ON PM.PAT_NUMBER = TR.PAT_NUMBER
WHERE TR.TR_DATETIME BETWEEN TRUNC(SYSDATE)
AND TRUNC(SYSDATE) + 86399 / 86400
AND TR.EDIT_NO < 0
AND MV.VALUE_TYPE IS NULL
AND MV2.VALUE_TYPE IS NULL
AND MV.VALUE_CODE >= 0
AND MV2.VALUE_CODE >= 0
AND M.SUB_SYS_EXT = 'G1'
AND ABS(MV.VALUE_CODE - MV2.VALUE_CODE) > 1
AND (PM.DISCHARGE_DATE IS NULL OR PM.DISCHARGE_DATE < SYSDATE)
GROUP BY TR.FACILITY_KEY, MV.VALUE_CODE)
SELECT CTE.FACILITY_KEY, CTE.DATE, CTE.PATIENT_STATUS, COUNT(*) AS [COUNT] FROM CTE
GROUP BY CTE.FACILITY_KEY, CTE.DATE, CTE.PATIENT_STATUS;
I see a few things wrong with your code.
First, you are selecting the following three columns FACILITY_KEY, VALUE_CODE and the count in the CTE:
SELECT TR.FACILITY_KEY ,
MV.VALUE_CODE ,
COUNT(*) as Count -- note there is no need to CAST(COUNT(*) AS NUMERIC(9, 0)) this
FROM OPTC.THS_T_TRANSACTIONS1 TR
But then when you select from the CTE you are selecting columns that you are not returning in the CTE:
with cte as
(
-- your query here does not return DATE or PATIENT_STATUS
)
SELECT CTE.FACILITY_KEY,
CTE.DATE,
CTE.PATIENT_STATUS,
COUNT(*) AS COUNT
FROM CTE
GROUP BY CTE.FACILITY_KEY, CTE.DATE, CTE.PATIENT_STATUS;
Where do PATIENT_STATUS and Date come from since you are not including them in your CTE? So these do not exist when you are trying to select them.
I replicated your error by including columns in the list that were not select in the CTE query.
The second issue is the CTE.DATE column. DATE is a reserved word, place that is double quotes CTE."DATE"
...AS [COUNT], ...AS NUMERIC(9, 0)) is not Oracle syntax and will never work. Simply remove [ ] and use NUMBER instead of NUMERIC. There is no need to CAST Count(). The Count() function will always return number, e.g. 0-zero or some number.
This is valid syntax in Oracle:
SELECT deptno, count(*) total_count_by_dept -- no need to cast or AS --
FROM scott.emp
GROUP BY deptno
/
Try not to use reserved words as COUNT for aliases:
SELECT CTE.FACILITY_KEY, CTE.DATE, CTE.PATIENT_STATUS, COUNT(*) AS total_cnt -- 'AS' is for clarity only, not required
FROM CTE
GROUP BY CTE.FACILITY_KEY, CTE.DATE, CTE.PATIENT_STATUS
/