SQL-query error in pyspark while using temp-table - sql

I have a SQL query which I have to access in PySpark(DataBricks). due to complex query, PySpark is not able to read the same. can someone check my query and assist me to get this query written in a single 'SELECT' statement not using 'WITH' statement.
Stage:- 1
promotions="""
(WITH VCTE_Promotions as (SELECT v.Shortname, v.Employee_ID_ALT, v.Job_Level,
v.Management_Level, CAST(sysdatetime() AS date) AS PIT_Date, v.Employee_Status_Alt as Employee_Status,
v.Work_Location_Region, v.Work_Location_Country_Desc, v.HML,
[DM_GlobalStaff].[dbo].[V_Worker_PIT].Is_Manager
FROM [DM_GlobalStaff].[dbo].[V_Worker_CUR] as v
LEFT OUTER JOIN
[DM_GlobalStaff].[dbo].[V_Worker_PIT] ON v.Management_Level = [DM_GlobalStaff].[dbo].[V_Worker_PIT].Management_Level),
VCTE_Promotion_v2_Eval as (
SELECT Employee_ID_ALT,
( SELECT max([pit_date]) AS prior_data
FROM [DM_GlobalStaff].[dbo].[V_Worker_PIT] AS t
WHERE (employee_id_alt = a.Employee_ID_ALT) AND (PIT_Date < a.PIT_Date) AND (Is_Manager <> a.Is_Manager) OR
(employee_id_alt = a.Employee_ID_ALT) AND (PIT_Date < a.PIT_Date) AND (Job_Level <> a.Job_Level)) AS prev_job_change_date, Is_Manager
FROM VCTE_Promotions AS a)
SELECT VCTE_Promotion_v2_Eval.Employee_ID_ALT, COALESCE (v_cur.Employee_Status_ALT, N'') AS Curr_Emp_Status,
COALESCE (v_cur.Employee_Type, N'') AS Curr_Employee_Type, v_cur.Hire_Date_Alt AS Curr_Hire_Date,
v_cur.Termination_Date_ALT AS Curr_Termination_Date, COALESCE (v_cur.Termination_Action_ALT, N'')
AS Curr_Termination_Action, cast (v_cur.Job_Level as int) AS Curr_Job_Level,
COALESCE (v_cur.Management_Level, N'') AS Curr_Management_Level,
COALESCE (VCTE_Promotion_v2_Eval.Is_Manager, N'') AS Curr_Ismanager,
CASE WHEN v_m.Job_Level < v_cur.Job_Level OR
(VCTE_Promotion_v2_Eval.Is_Manager = 1 AND v_m.Is_Manager = 0 AND v_m.Job_Level <= v_cur.Job_Level)
THEN 'Promotion' WHEN v_m.Job_Level <> v_cur.Job_Level OR
VCTE_Promotion_v2_Eval.Is_Manager <> v_m.Is_Manager THEN 'Other' ELSE '' END AS Promotion, v_cur.Tenure,
v_cur.Review_Rating_Current
FROM VCTE_Promotion_v2_Eval INNER JOIN
[DM_GlobalStaff].[dbo].[V_Worker_CUR] as v_cur ON VCTE_Promotion_v2_Eval.Employee_ID_ALT = v_cur.Employee_ID_ALT LEFT OUTER JOIN
[DM_GlobalStaff].[dbo].[V_Worker_PIT] as v_m ON VCTE_Promotion_v2_Eval.prev_job_change_date = v_m.PIT_Date AND
VCTE_Promotion_v2_Eval.Employee_ID_ALT = v_m.employee_id_alt
) as pr """
stage-2
promotions = spark.read.jdbc(url=jdbcUrl, table=promotions, properties=connectionProperties)
stage-3
promotions.count()
promotions.show()
Getting below error from Stage-2 query:-
com.microsoft.sqlserver.jdbc.SQLServerException: Incorrect syntax near the keyword &apos;WITH&apos;.
---------------------------------------------------------------------------
Py4JJavaError Traceback (most recent call last)
<command-2532359884208251> in <module>()
----> 1 promotions = spark.read.jdbc(url=jdbcUrl, table=promotions, properties=connectionProperties)
/databricks/spark/python/pyspark/sql/readwriter.py in jdbc(self, url, table, column, lowerBound, upperBound, numPartitions, predicates, properties)
533 jpredicates = utils.toJArray(gateway, gateway.jvm.java.lang.String, predicates)
534 return self._df(self._jreader.jdbc(url, table, jpredicates, jprop))
--> 535 return self._df(self._jreader.jdbc(url, table, jprop))
536
537
I dont have issue with my query, this is working perfectly fine with my SQL prompt. But as soon as I am using same query in PYSPARK(DataBricks) I am getting syntax error. Will you kindly help me with PySpark syntax as well.
your Prompt assistance will be highly appreciated.

I have no way of testing, but please try it, and compare the result to see if everything is matching.
Also, I am using cross appy instead of correlated subquery because there is no simple join and correlated subquery isn't efficient,
So Cross apply should do the job
(
SELECT
VCTE_Promotion_v2_Eval.Employee_ID_ALT
,COALESCE(v_cur.Employee_Type, N'') AS Curr_Employee_Type
,v_cur.Review_Rating_Current
(
SELECT
Employee_ID_ALT,
pr.prev_job_change_date,
IsManager
From
( SELECT
v.Shortname
,v.Employee_ID_ALT
,v.Job_Level
,v.Management_Level
,CAST(SYSDATETIME() AS DATE) AS PIT_Date
,v.Employee_Status_Alt AS Employee_Status
,v.Work_Location_Region
,v.Work_Location_Country_Desc
,v.HML
,dbo.T_Mngmt_Level_IsManager_Mapping.IsManager
FROM Worker_CUR AS v
LEFT OUTER JOIN dbo.T_Mngmt_Level_IsManager_Mapping
ON v.Management_Level = dbo.T_Mngmt_Level_IsManager_Mapping.Management_Level
) as VCTE_Promotions a
Cross APPLY (
SELECT
MAX(PIT_Date) AS prior_data
FROM dbo.V_Worker_PIT_with_IsManager AS t
WHERE (employee_id_alt = a.Employee_ID_ALT)
AND (PIT_Date < a.PIT_Date)
AND (IsManager <> a.IsManager)
OR (employee_id_alt = a.Employee_ID_ALT)
AND (PIT_Date < a.PIT_Date)
AND (Job_Level <> a.Job_Level)
)
AS pr
) as VCTE_Promotion_v2_Eval
INNER JOIN [DM_GlobalStaff].[dbo].[V_Worker_CUR] AS v_cur
ON VCTE_Promotion_v2_Eval.Employee_ID_ALT = v_cur.Employee_ID_ALT
LEFT OUTER JOIN dbo.V_Worker_PIT_with_IsManager AS v_m
ON VCTE_Promotion_v2_Eval.prev_job_change_date = v_m.PIT_Date
AND VCTE_Promotion_v2_Eval.Employee_ID_ALT = v_m.employee_id_alt ) as promotions

Related

How to solve this error: Cannot perform an aggregate function on an expression containing an aggregate or a subquery?

I have a query that works on SQLAnywhere, but for some reason does not work on SSMS, tell me what is wrong with it, because I get an error: Cannot perform an aggregate function on an expression containing an aggregate or a subquery.
SELECT (CASE
WHEN EXISTS (select 1 from READCHK_READSEQ_OVRCOMP where READCHK_ID = :readChkId AND READSEQ_ID = :readSeqId) THEN 'M'
WHEN ( COUNT(*) = Coalesce(SUM((SELECT COUNT(*) FROM READCHKFLEX flx WHERE flx.READCHKFLD_ID = rcfld.READCHKFLD_ID AND flx.READCHK_ID = :readChkId) ), 0 )) THEN 'A'
ELSE 'I' END) CompletionStatus
FROM READSEQFLD rsfld
JOIN READSEQ rseq on rseq.READSEQ_ID = rsfld.READSEQ_ID
JOIN READCHKPAGEFLD rcpf on rcpf.READCHKTYPE_ID = rseq.READCHKTYPE_ID and rcpf.READCHKFLD_ID = rsfld.READCHKFLD_ID
JOIN READCHKFLD rcfld on rcfld.READCHKFLD_ID = rcpf.READCHKFLD_ID
WHERE rsfld.READSEQ_ID = :readSeqId
AND Upper(rsfld.ACTIVATE_FLG) = 'Y'
AND rcfld.CALFORM_ID IS NULL
AND (rcpf.fld_properties is null or Upper(rcpf.fld_properties) <> 'HIDE');
How to fix that?
SQL Server doesn't allow aggregates on expressions containing sub queries.
I think the following should preserve your original behaviour
Append a CTE definition before the existing SELECT
WITH flx_agg
AS (SELECT COUNT(*) AS Cnt,
READCHKFLD_ID
FROM READCHKFLEX flx
WHERE flx.READCHK_ID = :readChkId
GROUP BY READCHKFLD_ID)
SELECT ...
Add an outer join to that to your existing join conditions
LEFT JOIN flx_agg
ON flx_agg.READCHKFLD_ID = rcfld.READCHKFLD_ID
Change the offending CASE WHEN condition to
WHEN COUNT(*) = COALESCE(SUM(flx_agg.Cnt), 0)

Syntax error on CASE Statement join in subquery

I have the below query and subquery where I am getting a SUM value, and I want to join PS_VOUCHER_LINE LineSub conditionally (on Line.LINE_NBR = LineSub.LINE_NBR) on PS_VOUCHER Line only when a VOUCHER_ID has more than 1 LINE_NBR, otherwise I don't want this last join condition to execute. I am getting a Syntax error on the statement (Incorrect syntax near '='.) . How can I get this conditional join to work properly?
SELECT
CONCAT(Header.BUSINESS_UNIT, Header.VOUCHER_ID) AS INVOICE_ID
,
(
SELECT SUM(LineSub.MERCHANDISE_AMT)
FROM PS_VOUCHER_LINE LineSub
WHERE Line.BUSINESS_UNIT = LineSub.BUSINESS_UNIT
AND Line.VOUCHER_ID = LineSub.VOUCHER_ID
AND
CASE
WHEN COUNT(Line.LINE_NBR) > 1 THEN Line.LINE_NBR = LineSub.LINE_NBR
END
GROUP BY LineSub.VOUCHER_ID
) + Header.FREIGHT_AMT + Header.SALETX_AMT AS GROSS_AMT_LINE_FREIGHT_TAX
FROM
PS_VOUCHER Header
INNER JOIN PS_VOUCHER_LINE Line ON Line.BUSINESS_UNIT = Header.BUSINESS_UNIT
AND Line.VOUCHER_ID = Header.VOUCHER_ID
WHERE
Header.VOUCHER_ID = '00241107'
just remove case and add add having where condition:
GROUP BY LineSub.VOUCHER_ID HAVING COUNT(Line.LINE_NBR) > 1

Snowflake unsupported subquery when using function

This is the function I created:
CREATE OR REPLACE FUNCTION NS_REPORTS.AP."COUPA_GET_EXCH_RATE"("from_curr_id" NUMBER(38,0), "to_curr_id" NUMBER(38,0), "date" DATE)
RETURNS FLOAT
LANGUAGE SQL
AS '
SELECT
COALESCE((
SELECT
RATE
FROM
(
SELECT
ROW_NUMBER() OVER (PARTITION BY DATE(RATE_DATE) ORDER BY RATE_DATE DESC) ROW_NUM
, RATE
FROM
CONNECTORS.COUPA.EXCHANGE_RATE
WHERE
FROM_CURRENCY_ID = from_curr_id
AND TO_CURRENCY_ID = to_curr_id
AND DATE(RATE_DATE) = date
) R
WHERE
ROW_NUM = 1
), 1)
';
I'm using the ROW_NUMBER function because the RATE_DATE field is actually datetime and so there are multiple records per date.
When I call the function by itself, it works fine. However, when I try to use it in a view, I get the unsupported subquery type error. The view works fine without it. Can anyone think of what I can do to either fix the error or work around it by rewriting the query?
EDIT 1: View code and exact error message
CREATE OR REPLACE VIEW COUPA_REQUISITION
AS
SELECT
RH.ID REQ_NUM
, RL.LINE_NUM REQ_LINE_NUM
, OH.PO_NUMBER
, REPLACE(REPLACE(OH.CUSTOM_FIELDS:"legacy-po-number", '"', ''), '.0', '') LEGACY_PO_NUMBER
, S."NAME" SUPPLIER
, OH.STATUS
, UR.FULLNAME REQUESTED_BY
, UC.FULLNAME CREATED_BY
, OL.RECEIVED
, DATE(RH.SUBMITTED_AT) ORDER_DATE
, DATE(RH.NEED_BY_DATE) NEEDED_BY_DATE
, RL."DESCRIPTION" ITEM
, CAST(NULL AS VARCHAR) CHART_OF_ACCOUNTS
, REPLACE(OH.CUSTOM_FIELDS:"purchase-type", '"', '') PURCHASE_TYPE
, COM."NAME" COMMODITY
, ACT.NS_SUB_NAME SUBSIDIARY
, ACT.NS_ACCT_NAME_FULL "ACCOUNT"
, ACT.NS_DEPT_NAME_FULL DEPARTMENT
, ACT.NS_L3_DEPT_NAME L3_DEPARTMENT
, ACT.NS_LOC_NAME "LOCATION"
, RL.QUANTITY QTY
, OL.LINE_NUM ORDER_LINE_NUM
, RL.TOTAL * NS_REPORTS.AP.COUPA_GET_EXCH_RATE(RL.CURRENCY_ID, 1, DATE(RH.SUBMITTED_AT)) LINE_TOTAL
, RL.TOTAL - OL.INVOICED UNINVOICED_AMOUNT
, OL.INVOICED INVOICED_TOTAL
, RLSUM.TOTAL TOTAL
, REPLACE(IL.CUSTOM_FIELDS:"amortization-schedule"."name", '"', '') AMORTIZATION_SCHEDULE
, CASE WHEN COALESCE(IL.CUSTOM_FIELDS:"amortization-start-date", '') <> '' THEN DATE(REPLACE(IL.CUSTOM_FIELDS:"amortization-start-date", '"', '')) ELSE NULL END AMORTIZATION_START_DATE
, CASE WHEN COALESCE(IL.CUSTOM_FIELDS:"amortization-end-date", '') <> '' THEN DATE(REPLACE(IL.CUSTOM_FIELDS:"amortization-end-date", '"', '')) ELSE NULL END AMORTIZATION_END_DATE
, CASE WHEN COALESCE(OH.CUSTOM_FIELDS:"contract-start-date", '') <> '' THEN DATE(REPLACE(OH.CUSTOM_FIELDS:"contract-start-date", '"', '')) ELSE NULL END CONTRACT_START_DATE
, CASE WHEN COALESCE(OH.CUSTOM_FIELDS:"contract-end-date", '') <> '' THEN DATE(REPLACE(OH.CUSTOM_FIELDS:"contract-end-date", '"', '')) ELSE NULL END CONTRACT_END_DATE
FROM
CONNECTORS.COUPA.REQUISITION_HEADER RH
JOIN CONNECTORS.COUPA.REQUISITION_LINE RL ON RL.REQUISITION_HEADER_ID = RH.ID
JOIN NS_REPORTS.AP.COUPA_ACCOUNT ACT ON ACT.COUPA_ACCT_ID = RL.ACCOUNT_ID
JOIN CONNECTORS.COUPA."USER" UR ON UR.ID = RH.REQUESTED_BY_ID
JOIN CONNECTORS.COUPA."USER" UC ON UC.ID = RH.CREATED_BY_ID
JOIN (
SELECT
REQUISITION_HEADER_ID
, SUM(TOTAL) TOTAL
FROM
CONNECTORS.COUPA.REQUISITION_LINE
GROUP BY
REQUISITION_HEADER_ID
) RLSUM ON RLSUM.REQUISITION_HEADER_ID = RH.ID
LEFT JOIN CONNECTORS.COUPA.ORDER_LINE OL ON OL.ID = RL.ORDER_LINE_ID
LEFT JOIN CONNECTORS.COUPA.ORDER_HEADER OH ON OH.ID = OL.ORDER_HEADER_ID
LEFT JOIN CONNECTORS.COUPA.COMMODITY COM ON COM.ID = OL.COMMODITY_ID
LEFT JOIN CONNECTORS.COUPA.SUPPLIER S ON S.ID = OH.SUPPLIER_ID
LEFT JOIN CONNECTORS.COUPA.INVOICE_LINE IL ON IL.ORDER_LINE_ID = OL.ID
Error message:
SQL Error [2031] [42601]: SQL compilation error:
Unsupported subquery type cannot be evaluated
The error is a correlated subquery and the are not supported (beyond some tiny toy examples)
But the basic form is
SELECT a.a
(select b.b from b where b.a = a.a order by b.y limit 1)
FROM a;
in effect for each row, a sub-query is run on another table to get a value. There are many tricks done in other DB's to make this "work" but in effect there is work done on each row. Snowflake does not types of for each row operations.
The good news is there are other patterns that are effectively the same, that snowflake does support, the two patterns are really the same use a CTE or join to a sub-select which is the same thing.
so the above becomes:
WITH subquery AS (
SELECT b.a, b.b FROM b
QUALIFY row_number() over (partition by b.a order by b.y) = 1
)
SELECT a.a
sq.b
FROM a
JOIN subquery AS sq
ON sq.a = a.a
So we first process/shape "all records" from the other/sub table, and we only keep the rows that have the count/shape we want, and then join to that result. The is very parallelizable, so performs well. The reason Snowflake does not auto translate a sub-query for you, is it rather easy to get it wrong, and they presently are spending there development efforts working on features that do not exist at all, etc etc, and it can be rewritten by you, given you understand your model.
What if you move this to the FROM clause? I would phrase this as:
SELECT COALESCE(MAX(er.rate), 1)
FROM (SELECT er.*
FROM CONNECTORS.COUPA.EXCHANGE_RATE er
WHERE er.FROM_CURRENCY_ID = in_from_curr_id AND
er.TO_CURRENCY_ID = in_to_curr_id AND
DATE(er.RATE_DATE) = in_date
ORDER BY RATE_DATE DESC
LIMIT 1
) er;
Notice that I changed the names of the parameters so they are more obviously input parameters.

I am converting Oracle queries to Standard Bigquery, i am gettting error "IN subquery is not supported inside join predicate."

I have converted oracle query into below standard bq but in last statement(IN subquery). I am getting error:
"IN subquery is not supported inside join predicate."
Please advise how to use IN subquery in bq in the below code
#Last part of the code
INNER JOIN (
SELECT
DISTINCT `domain-rr.oracle_DB_DB.he_project_assoc`.PARENT_ISBN
PARENT_ISBN,
SUM (`domain-rr.DB_RPT.PROJECT_GR_QTY`.GR_QTY) GR_QTY
FROM
`domain-rr.oracle_DB_DB.he_project_assoc`
INNER JOIN
`domain-rr.DB_RPT.PROJECT_GR_QTY`
ON
`domain-rr.oracle_DB_DB.he_project_assoc`.child_ISBN = `domain-
rr.DB_RPT.PROJECT_GR_QTY`.BIC_GCISBN
AND `domain-rr.oracle_DB_DB.he_project_assoc`.BREAK_LABEL <>
'Associated ISBNs'
GROUP BY
`domain-rr.oracle_DB_DB.he_project_assoc`.PARENT_ISBN) xx
ON
yy.PARENT_ISBN = xx.PARENT_ISBN
AND yy.CIRCULATION_INT < xx.GR_QTY
AND yy.PARENT_ISBN IN
( SELECT
DISTINCT _BIC_GCISBN
FROM
`domain-rr.DB_RPT.BIC_GM_AGCPOAODS00_BO_VW`
INNER JOIN
`domain-rr.oracle_DB_boadmin.fiscal_bo`
ON
_BIC_ZC2GRIRIN = 'G'
AND _BIC_ZCLOEKZ = ' '
AND SUBSTR (BOUND_DATE, 1, 6) = `domain-
rr.oracle_DB_boadmin.fiscal_bo`.PRIOR_FISC_YEAR_MONTH )
Can you try like this:
Select * from (
#Last part of the code
INNER JOIN (
SELECT
DISTINCT `pearson-rr.oracle_grdw_grdw.he_project_assoc`.PARENT_ISBN
PARENT_ISBN,
SUM (`pearson-rr.GRDW_RPT.PROJECT_GR_QTY`.GR_QTY) GR_QTY
FROM
`pearson-rr.oracle_grdw_grdw.he_project_assoc`
INNER JOIN
`pearson-rr.GRDW_RPT.PROJECT_GR_QTY`
ON
`pearson-rr.oracle_grdw_grdw.he_project_assoc`.child_ISBN = `pearson-
rr.GRDW_RPT.PROJECT_GR_QTY`.BIC_GCISBN
AND `pearson-rr.oracle_grdw_grdw.he_project_assoc`.BREAK_LABEL <>
'Associated ISBNs'
GROUP BY
`pearson-rr.oracle_grdw_grdw.he_project_assoc`.PARENT_ISBN) xx
ON
yy.PARENT_ISBN = xx.PARENT_ISBN
AND yy.CIRCULATION_INT < xx.GR_QTY
) AA
where AA.PARENT_ISBN IN
( SELECT
DISTINCT _BIC_GCISBN
FROM
`pearson-rr.GRDW_RPT.BIC_GM_AGCPOAODS00_BO_VW`
INNER JOIN
`pearson-rr.oracle_grdw_boadmin.fiscal_bo`
ON
_BIC_ZC2GRIRIN = 'G'
AND _BIC_ZCLOEKZ = ' '
AND SUBSTR (BOUND_DATE, 1, 6) = `pearson-
rr.oracle_grdw_boadmin.fiscal_bo`.PRIOR_FISC_YEAR_MONTH )

ORA-00936:missing expression -- oracle

Although I find some related question in Stackoverflow:
ORA-00936: missing expression oracle
ORA-00936: missing expression Oracle Apex
But those is not fit my question:
(SELECT t_1.oi1name OI1NAME
FROM ( select oi1.name oi1name, oi.name oname, b.prodesc, b.foundtime, b.occurrencetime, b.divisionproject, b.pilenumber, b.constructionteam, b.progress, h.pk_group
from zspm_qa_monthlyreport_b b
left outer join zspm_qa_monthlyreport_h h on b.pk_monthlyreport_h = h.pk_monthlyreport_h
left outer join org_itemorg oi on oi.pk_itemorg = h.pk_org
left outer join org_itemorg oi1 on oi1.pk_itemorg = oi.pk_fatherorg
where h.dr = 0 and h.billstatus = 1 and b.dr = 0
and oi.code like CONCAT ( ( select code from org_itemorg where pk_itemorg in () ), '%' ) and h.def1 = '2016-01' ) t_1
WHERE t_1.pk_group = '0001A2100000000007QL')
This is my sql code for query, but I don't know where is the issue.
The in list is empty. I don't think that is allowed.
More importantly, you probably intend this logic:
where . . . and
exists (select 1
from org_itemorg oio
where pk_itemorg in (. . .) and
oi.code like oio.code || '%'
) and
h.def1 = '2016-01'
Your subquery can return more than one row, which would be a problem when you run the query.