How to rewrite sql with multi-subqueries in Hive - hive

Here is a SQL with multi-subqueries for GreenPlum. Unfortunately I have to migrate SQL to Hive, I don't know how to deal with these subqueries in WHERE clause.
select
t.ckid , t.prod_id , t.supp_num , t.wljhdh ,
sum(t.sssl) as zmkc , max(t.dj) as dj
from
%s t
where
exists (select 1
from dw_stage.wms_c_wlsjd w
where w.lydjh = t.wljhdh and w.lzztflag='上架确认'
and (ckid , kqid) in (select ckid , kqid
from dw_stage.jcxx_kqxx
where kqytsxid in ('2','3'))
)
and (t.ckid,t.supp_num) in (select cgck_stock_id,vndr_code from madfrog.cfg_vendor_dist where status=1 and send_method=2 and upper(purch_warehouse_type)='F')
and supp_num not in (select distinct vndr_code as supp_no from madfrog.cfg_vendor_dist where status=1 and send_method in (4,5))
group by t.ckid , t.prod_id , t.supp_num , t.wljhdh
Thank you for your tips.

You will need to convert the subquery and the in clause to
Left Outer Join
Focusing on the structure:
select <cols list>
from <tabname> t
left outer join dw_stage.wms_c_wlsjd w
on w.lydjh = t.wljhdh
where w.lzztflag='上架确认'
The
((t.ckid,t.supp_num) in (select .. )
and
supp_num not in (select distinct vndr_code as supp_no
will also need to be rewritten as an outer joins.
You can find more information about using outer join's in my answer to this other question here: Hive command to execute NOT IN clause

Related

LEFT JOIN & SUM GROUP BY

EDIT:
The result supposed to be like this:
desired result
I have this query:
SELECT DISTINCT mitarbeiter.mitarbnr, mitarbeiter.login, mitarbeiter.name1, mitarbeiter.name2
FROM vertragspos
left join vertrag_ek_vk_zuord ON vertragspos.id = vertrag_ek_vk_zuord.ek_vertragspos_id
left join mitarbeiter ON vertrag_ek_vk_zuord.anlage_mitarbnr = mitarbeiter.mitarbnr
left join vertragskopf ON vertragskopf.id = vertragspos.vertrag_id
left join
(
SELECT wkurse.*, fremdwaehrung.wsymbol
FROM wkurse
INNER join
(
SELECT lfdnr, Max(tag) AS maxTag
FROM wkurse
WHERE tag < SYSDATE
GROUP BY lfdnr
) t1
ON wkurse.lfdnr = t1.lfdnr AND wkurse.Tag = t1.maxTag
INNER JOIN fremdwaehrung ON wkurse.lfdnr = fremdwaehrung.lfdnr
) wkurse ON vertragskopf.blfdwaehrung = wkurse.lfdnr
left join
(
SELECT vertrag_ID, Sum (preis) preis, Sum (menge) menge, Sum (preis * menge / Decode (vertragskopf.zahlintervall, 1,1,2,2,3,3,4,6,5,12,1) / wkurse.kurs) vertragswert
FROM vertragspos
GROUP BY vertrag_ID
) s ON vertragskopf.id = s.vertrag_id
But I always get an error on line 21 Pos 145:
ORA-00904 WKURSE.KURS invalid identifier
The WKURSE table is supposed be joined already above, but why do I still get error?
How can I do join with all these tables?
I need to join all these tables:
Mitarbeiter, Vertragspos, vertrag_ek_vk_zuord, wkurse, fremdwaehrung, vertragskopf.
What is the right syntax? I'm using SQL Tool 1,8 b38
Thank you.
Because LEFT JOIN is executed on entire dataset, and not in row-by-row manner. So there's no wkurse.kurs available in the execution context of subquery. Since you join that tables, you can place the calculation in the top-most select statement.
EDIT:
After you edited the statement, it became clear where does vertragskopf.zahlintervall came from. But I don't know where are you going to use calculated vertragswert (now it is absent in the query), so I've put it in the result. As I'm not a SQL parser and have no idea of your tables, so I cannot check the code, but calculation now can be resolved (all the values are available in calculation context).
SELECT DISTINCT mitarbeiter.mitarbnr, mitarbeiter.login, mitarbeiter.name1, mitarbeiter.name2, s.amount / Decode (vertragskopf.zahlintervall, 1,1,2,2,3,3,4,6,5,12,1) / wkurse.kurs) vertragswert
FROM vertragspos
left join vertrag_ek_vk_zuord ON vertragspos.id = vertrag_ek_vk_zuord.ek_vertragspos_id
left join mitarbeiter ON vertrag_ek_vk_zuord.anlage_mitarbnr = mitarbeiter.mitarbnr
left join vertragskopf ON vertragskopf.id = vertragspos.vertrag_id
left join (
SELECT wkurse.*, fremdwaehrung.wsymbol
FROM wkurse
INNER join (
SELECT lfdnr, Max(tag) AS maxTag
FROM wkurse
WHERE tag < SYSDATE
GROUP BY lfdnr
) t1
ON wkurse.lfdnr = t1.lfdnr AND wkurse.Tag = t1.maxTag
INNER JOIN fremdwaehrung ON wkurse.lfdnr = fremdwaehrung.lfdnr
) wkurse ON vertragskopf.blfdwaehrung = wkurse.lfdnr
left join (
SELECT vertrag_ID, Sum (preis) preis, Sum (menge) menge, Sum (preis * menge) as amount
FROM vertragspos
GROUP BY vertrag_ID
) s ON vertragskopf.id = s.vertrag_id
Rewriting the code using WITH clause makes it much clearer than select from select.
Also get the rate on last day before today in oracle is as simple as
select wkurse.lfdnr
, max(wkurse.kurs) keep (dense_rank first order by wkurse.tag desc) as rate
from wkurse
where tag < sysdate
group by wkurse.lfdnr
One option is a lateral join:
left join lateral
(SELECT vertrag_ID, Sum(preis) as preis, Sum(menge) as menge,
Sum (preis * menge / Decode (vertragskopf.zahlintervall, 1,1,2,2,3,3,4,6,5,12,1) / wkurse.kurs) vertragswert
FROM vertragspos
GROUP BY vertrag_ID
) s
ON vertragskopf.id = s.vertrag_id

SQL Intersect not supported in Phoenix , alternative for intersect in phoenix?

I have the following SQL expression:
SELECT SS_ITEM_SK AS POP_ITEM_SK
FROM (SELECT SS_ITEM_SK
FROM (SELECT SS_ITEM_SK,(ITEM_SOLD-ITEM_RETURNED) AS TOT_SOLD_QTY FROM (SELECT SS_ITEM_SK,COUNT(SS_ITEM_SK) AS ITEM_SOLD,COUNT(SR_ITEM_SK) AS ITEM_RETURNED FROM STORE_SALES1 right outer join STORE_RETURNS1 on SS_TICKET_NUMBER = SR_TICKET_NUMBER AND SS_ITEM_SK = SR_ITEM_SK GROUP BY SS_ITEM_SK)))
INTERSECT
SELECT CS_ITEM_SK AS POP_ITEM_SK FROM (SELECT CS_ITEM_SK
FROM (SELECT CS_ITEM_SK,(ITEM_SOLD-ITEM_RETURNED) AS TOT_SOLD_QTY FROM (SELECT CS_ITEM_SK,COUNT(CS_ITEM_SK) AS ITEM_SOLD,COUNT(CR_ITEM_SK) AS ITEM_RETURNED FROM CATALOG_SALES1 right outer join CATALOG_RETURNS1 on CS_ORDER_NUMBER = CR_ORDER_NUMBER and CS_ITEM_SK = CR_ITEM_SK GROUP BY CS_ITEM_SK)))
INTERSECT
SELECT WS_ITEM_SK AS POP_ITEM_SK FROM (SELECT WS_ITEM_SK
FROM (SELECT WS_ITEM_SK,(ITEM_SOLD-ITEM_RETURNED) AS TOT_SOLD_QTY FROM (SELECT WS_ITEM_SK,COUNT(WS_ITEM_SK) AS ITEM_SOLD,COUNT(WR_ITEM_SK) AS ITEM_RETURNED FROM WEB_SALES1 right outer join WEB_RETURNS1 on WS_ORDER_NUMBER = WR_ORDER_NUMBER AND WS_ITEM_SK = WR_ITEM_SK GROUP BY WS_ITEM_SK)))
Apache phoenix is not supporting the keyword INTERSECT. Can somebody please help me to correct above query without using INTERSECT?
I think there are multiple ways you can do this:
Join Method
select * from ((query1 inner join query2 on column_names) inner join query3 on column_names)
Exists Method
(query1 where exists (query2 where exists (query3)) )
In Method
(query1 where column_name in (query2 where column_name in (query3)) )
References: https://blog.jooq.org/2015/10/06/you-probably-dont-use-sql-intersect-or-except-often-enough/
and http://phoenix.apache.org/subqueries.html
Although I would use the exists/in over the join since if these queries return huge data then you might have to optimize your queries using this:
https://phoenix.apache.org/joins.html

Error in nested SQL statement

Can someone help me fix this SQL statement? I have 2 tables... trying to get a list of all records in table 1 (c) along with a count (if any) of matching records in table 2 (cp_docs).
SELECT TOP 100 c.cal_procedure ,
c.description ,
c.active ,
c.create_user ,
c.create_date ,
c.edit_user ,
c.edit_date ,
c.id,
cp_docs.cpd
FROM cal_procedure c
OUTTER JOIN (select cal_procedure as cp, count(id) as cpd
from cal_procedure_doc
group by cal_procedure) cp_docs
ON cp_docs.cp = c.cal_procedure
Thanks,
Tracy
Hard to say without the error message but your outer join has a couple issues.
OUTER is incorrectly written at OUTTER
Your OUTER keyword needs to be prefixed with LEFT OR RIGHT. With the logic in your query you want likely want LEFT
Fixed SQL:
SELECT TOP 100 c.cal_procedure ,
c.description ,
c.active ,
c.create_user ,
c.create_date ,
c.edit_user ,
c.edit_date ,
c.id,
cp_docs.cpd
FROM cal_procedure c
LEFT OUTER JOIN (select cal_procedure as cp, count(id) as cpd
from cal_procedure_doc
group by cal_procedure) cp_docs
ON cp_docs.cp = c.cal_procedure
Now in your query, you could get null values in the cpd column if there were no values in the cal_prodcedure_doc table. If you look at Max's answer, you would get 0's instead. If you wanted to use your current approach but have the zero's display you would need to wrap cp_docs.cpd in a coalesce function
coalesce(cp_docs.cpd, 0)
In the end I think Max's answer is easier to read and probably the way I would write this query as I think it's easier to read. If the tables are huge you may want to check how each performs to see one is better than the other.
You can just add a subquery to the SELECT clause. It's cleaner than joining a temp table. If you try to read someone else's query to figure out how a calculation is done, you'll start with the SELECT statement. If the select statement points you to a table alias (e.g. cp_docs), you need to find the table in the FROM clause... etc. The execution plans are almost identical; the proposed SELECT clause subquery actually eliminates one innocuous Compute Scaler step.
SELECT c.cal_procedure ,
c.description ,
c.active ,
c.create_user ,
c.create_date ,
c.edit_user ,
c.edit_date ,
c.id,
(SELECT COUNT(*) FROM cal_procedure_docs where cal_procedure = c.cal_procedure) AS cpd
FROM cal_procedure c
Perhaps you want outer apply :
SELECT TOP 100 c.cal_procedure, c.description, c.active, c.create_user,
c.create_date, c.edit_user, c.edit_date, c.id, cp_docs.cpd
FROM cal_procedure c OUTER APPLY
(select count(id) as cpd
from cal_procedure_doc
where cal_procedure = c.cal_procedure
) cp_docs
ORDER BY ? ? ? ;

How can I convert a SQL query with a derived table to HQL?

SELECT *
FROM visitdetails vd
LEFT JOIN
(SELECT MAX(id) AS id, VisitID
FROM claimfilelist GROUP BY VisitID) cf ON cf.visitid = vd.Id
LEFT JOIN claimfilelist cf1 ON cf1.id = cf.id
I have this SQL query. How can I convert it to HQL?
The HQL-documentation says that subqueries are only allowed in SELECT and WHERE. So, my first step is to move the subquery to the WHERE-clause:
SELECT *
FROM visitdetails vd
LEFT JOIN claimfilelist cf ON cf.visitid = vd.id
WHERE cf.id IS NULL OR cf.id = (
SELECT max(cfInner.id)
FROM claimfilelist cfInner
WHERE cfInner.visitId = vd.id
)
Depending on your Hibernate - version you might need to change the joins. I am not sure if the query works, but you could give the approach a try.

SQL Query PIVOT to MS Access SQL Query

I have this query on SQL Server
;WITH tmpTbl AS
(SELECT Kit_Tbl.Kit_Number
,Kit_Tbl.Kit_Refrigerant
,CompType_Tbl.Component_Type
,Comp_List_Tbl.Component_Num
FROM Kit_Tbl
INNER JOIN Kit_Library
ON Kit_Library.Kit_Number = Kit_Tbl.Kit_Number
INNER JOIN CompType_Tbl
ON CompType_Tbl.Component_Type = Kit_Library.Component_Type
INNER JOIN Comp_List_Tbl
ON Comp_List_Tbl.Component_Type = CompType_Tbl.Component_Type)
select Kit_Number
, Kit_Refrigerant
, [Compressor]
, [Condensing Unit]
from
(
select Kit_Number, Component_Type, Component_Num, Kit_Refrigerant
from tmpTbl
) d
pivot
(
max(Component_Num)
for Component_Type in ([Compressor], [Condensing Unit])
) piv;
I tried converting it to MS Access query but I encountered Syntax Error on Transform Statement:
TRANSFORM MAX(Comp_List_Tbl.Component_Num) AS Comp_Num
SELECT Kit_Tbl.Kit_Number,
CompType_Tbl.Component_Type,MAX(Comp_List_Tbl.Component_Num)
FROM Comp_List_Tbl INNER JOIN (Kit_Tbl INNER JOIN (Kit_Library INNER JOIN
CompType_Tbl ON Kit_Library.Component_Type = CompType_Tbl.Component_Type) ON
Kit_Tbl.Kit_Number = Kit_Library.Kit_Number) ON (CompType_Tbl.Component_Type =
Comp_List_Tbl.Component_Type);
GROUP BY Kit_Tbl.Kit_Number
PIVOT IN CompType_Tbl.Component_Type
Can anyone help me with this?
In your last line :
PIVOT CompType_Tbl.Component_Type
No IN is required.