Convert ANSI SQL correlated subquery to Spark SQL - apache-spark-sql

I have the following Oracle subquery that I'm having trouble converting to Spark SQL.
SELECT ID,
( SELECT v1.COL1
FROM VIEW1 v1
WHERE v1.COL2 = (SELECT COL2
FROM VIEW2 v2
WHERE v2.ID = v3.ID)
AND v1.COL3 = (SELECT v2.COL3
FROM VIEW2 v2
WHERE v2.COL3 = v3.COL3)
)
FROM VIEW3 v3
I have tried to run it in the following format but instead of returning one row, the subquery is returning multiple rows:
SELECT v3.id,
v1.col1
FROM view3 v3
LEFT OUTER JOIN view1 v1
ON ( EXISTS(
SELECT 1
FROM view2 v2a
WHERE v2a.col3 = v3.col3
AND v2a.col3 = v1.col3
)
AND EXISTS(
SELECT 1
FROM view2 v2b
WHERE v2b.id = v3.id
AND v2b.col2 = v1.col2
)
)
Do you know if there is an online tool that can do these conversions for me?

Related

Is it possible to replace a cross apply with a join?

I am reverse engineering some legacy SQL algorithms to move to apache spark.
I have encountered a across apply which I understand is TSQL specific and there is no direct equivalent in ANSII or Spark SQL.
The sanitized algorithm is:
SELECT
Id_P ,
Monthindex ,
(
SELECT
100 * (STDEV(ResEligible.num_valid) / AVG(ResEligible.num_valid)) AS Pre_Coef_Var
FROM
tbl_p a CROSS APPLY
(
SELECT
e.Monthindex ,
e.num AS num_valid
FROM
dbo.tbl_p e
WHERE
e.Monthindex = a.MonthIndex
AND e.Id_P = a.Id_P
UNION ALL
SELECT DISTINCT
B1.[MonthIndex ] ,
Tr.num AS num_valid
FROM
#tbl_pr B1
INNER JOIN
#tbl_pr B2
ON
B1.[Id_P] = B2.[Id_P]
AND B2.Rang - B1.Rang BETWEEN 0 AND 2
INNER JOIN
dbo.tbl_p Tr
ON
Tr.Id_P = B1.Id_P
AND Tr.Monthindex = B1.Monthindex
WHERE
a.Id_P = B1.[Id_P]
AND B2.[MonthIndex] =
(
SELECT
MAX([MonthIndex])
FROM
#tbl_pr
WHERE
[MonthIndex] < a.MonthIndex
AND [Id_P] = a.Id_P) ) AS ResEligible
WHERE
a.Id_P = result.Id_P
AND a.MonthIndex = result.MonthIndex) AS Coeff
FROM
tbl_p AS result
WHERE
1 = 1
AND MonthIndex = #CurrentMonth
GROUP BY
Id_P ,
Monthindex) AS CC
so for every row in alias b we cross apply to the inner queries.
Is it possible to re-write the cross apply in terms of join operations (or otherwise) so I can re-implement in spark sql?
Cheers
Terry
Seems like you could rewrite your query as the below:
SELECT T1.col1,
T1.col2,
sq.col3Sum
FROM tbl1 T1
CROSS JOIN (SELECT SUM(T1sq.Col3) AS col3Sum
FROM tbl1 T1sq
JOIN tbl2 T2 ON T1sq.Col1 = T2.Col2
JOIN tbl3 T3 ON T2.col1 = T3.Col1) sq;
Seems odd, however, that there was no JOIN criteria between the 2 references to tbl1.

I need more than a row PostgreSQL

I need a PostgreSQL subquery that can return more than a row. Here's the piece of the query that I have so far:
select (SELECT ARRAY[url, thumb_1, thumb_200, thumb_500]
FROM "Image"
LEFT JOIN "Product_Image"
ON "Image".id = "Product_Image".image_id
WHERE "Product_Image".product_id = 517
ORDER BY "Product_Image".sort ASC) as images
Put the subquery in the FROM clause?
select vals
from (SELECT ARRAY[url, thumb_1, thumb_200, thumb_500] as vals
FROM "Image" LEFT JOIN
"Product_Image"
ON "Image".id = "Product_Image".image_id
WHERE "Product_Image".product_id = 517
) images

trying to merge 2 views

I need to retrieve the data provided by this view :
BLICK_1_DESCR_LIST.
I didn't find how to create it directly. So I created the view BLICK_1_DESCR_NO_LIST which is used in the second view BLICK_1_DESCR_LIST.
I would like to do it in one view which is better.
CREATE VIEW BLICK_1_DESCR_NO_LIST
AS SELECT ITEM_ID , MIN(ITEM_DESCR_NO) MIN_I_D_NO,
COUNT(ITEM_DESCR_NO) COUNT_I_D_NO FROM BLICK_ITEM_DESCR
GROUP BY ITEM_ID
UNION
SELECT ID , 0 ZERO, 0 ZERO2 FROM BLICK_ITEM
LEFT JOIN BLICK_ITEM_DESCR ON BLICK_ITEM.ID = BLICK_ITEM_DESCR.ITEM_ID
WHERE ITEM_DESCR_NO IS NULL;
CREATE VIEW BLICK_1_DESCR_LIST
AS SELECT V1.ITEM_ID, V1.MIN_I_D_NO, V1.COUNT_I_D_NO, T1.ITEM_DESCR
FROM BLICK_1_DESCR_NO_LIST V1
LEFT JOIN BLICK_ITEM_DESCR T1 ON V1.ITEM_ID = T1.ITEM_ID
AND V1.MIN_I_D_NO = T1.ITEM_DESCR_NO
ORDER BY ITEM_ID;
You can just incorporate the first view as a subquery:
CREATE VIEW BLICK_1_DESCR_LIST AS
SELECT V1.ITEM_ID, V1.MIN_I_D_NO, V1.COUNT_I_D_NO, T1.ITEM_DESCR
FROM ((SELECT ITEM_ID, MIN(ITEM_DESCR_NO) as MIN_I_D_NO,
COUNT(ITEM_DESCR_NO) as COUNT_I_D_NO
FROM BLICK_ITEM_DESCR
GROUP BY ITEM_ID
) UNION
(SELECT ID, 0, 0
FROM BLICK_ITEM LEFT JOIN
BLICK_ITEM_DESCR
ON BLICK_ITEM.ID = BLICK_ITEM_DESCR.ITEM_ID
WHERE ITEM_DESCR_NO IS NULL
)) V1 LEFT JOIN
BLICK_ITEM_DESCR T1
ON V1.ITEM_ID = T1.ITEM_ID AND
V1.MIN_I_D_NO = T1.ITEM_DESCR_NO
ORDER BY ITEM_ID;

Table alias name scope in sub-select query

Please have a look at the query below - I am getting invalid identifier t1.oid in the below inner query.
I have column oid in iclr_request t1
select t1.requestNo
, t2.routeDistance,
, (
select WM_CONCAT(crc7) as "TravCirc7s"
from (
select (
select crc7
from dim_afi_dnld_stn_v1
where stn_sys_nbr = t3.stn_sys_nbr
and rownum=1
) as crc7
from iclr_trav_circ7 t3
where request_oid = **t1.oid**
and sub_route_index=0
and station_type_oid = 1
order by sequence
)
)
from iclr_request t1
, iclr_summary_results t2
where t1.oid = t2.request_oid
You can try this:
select t1.requestNo , t2.routeDistance,
WM_CONCAT((select crc7 from dim_afi_dnld_stn_v1 where stn_sys_nbr = t3.stn_sys_nbr and rownum=1)) as "TravCirc7s"
from iclr_request t1
join iclr_summary_results t2 on t1.oid = t2.request_oid
left join iclr_trav_circ7 t3 on t3.request_oid = t1.oid
and t3.sub_route_index=0
and t3.station_type_oid = 1
group by t1.requestNo , t2.routeDistance;
Correlated subqueries may refer their parents only 1 level above (although some Oracle documentation says it's unlimited)
EDIT: It doesn't save the order by sequence in WM_CONCAT. You may need to wrap it a parent query and then wm_concat

Find the common values in all subqueries

I have a bunch of subqueries that each return a bunch of records with two ID fields. I need to return a list of all ID pairs that exist in all subqueries. I was thinking I could do something like this:
SELECT Q1.V1, Q1.V2
FROM ( [SUBQUERY1] ) AS Q1
INNER JOIN ( [SUBQUERY2] ) AS Q2 ON Q2.V1 = Q1.V1 AND Q2.V2 = Q1.V2
INNER JOIN ( [SUBQUERY3] ) AS Q3 ON Q3.V1 = Q2.V1 AND Q3.V2 = Q2.V2
INNER JOIN ( [SUBQUERY4] ) AS Q4 ON Q4.V1 = Q3.V1 AND Q4.V2 = Q3.V2
Is there a better way?
Apparently the answer is to use INTERSECT:
[SUBQUERY1]
INTERSECT
[SUBQUERY2]
INTERSECT
[SUBQUERY3]
INTERSECT
[SUBQUERY4]
Nice!
Intersect is probably better but you can also use group by along with HAVING COUNT(*) > number_of_subqueries clause.
select V1, V2
from
(
(subquery_1)
union
(subquery_2)
union
(subquery_3)
)
group by v1, v2
having count(*) > 3