Impala Error : AnalysisException: Multiple subqueries are not supported in expression: (CASE *** WHEN THEN) - sql

I have a where Clause that I need to check if values exists in a table, and I'm doing that in a (subquery). The problem is, that should be made based on
values - 'FIX' and 'VAR'. Depending on each, we need to check on a different table (subquery). To achieve that goal I'm using a Case When statement in the where clause, as shown below:
select *
FROM T1
where
(upper(trim(ITAXAVAR)) = 'S'
and
(
upper(trim(CTIPAMOR)) not in ('A','U','F')
)
)
and
--problem starts here.....
(case ucase(trim(CTIPTXFX)) --Values 'FIX';'VAR';'PUR'
WHEN 'FIX'
THEN
(concat(trim(CPRZTXFX),trim(CTAXAREF)) not in
(select trim(A.tayd91c0_celemtab)
from cd_estruturais.tat91_tabelas A
where A.tayd91c0_ctabela = 'W03' and
--data_date_part = '${Data_ref}' and --por vezes não temos actualização TAT91 para mesma data_ref das tabelas
A.data_date_part = (select max(B.data_date_part)
from cd_estruturais.tat91_tabelas B
where A.tayd91c0_ctabela = B.tayd91c0_ctabela and
B.data_date_part > date_add(TO_DATE(FROM_UNIXTIME(UNIX_TIMESTAMP())),-5)
)
and length(nvl(trim(A.tayd91c0_celemtab),'')) <> 0
)
)
WHEN 'VAR'
THEN
(concat(trim(CTAXAREF),trim(CPERRVTX)) not in
(select concat(trim(A.CTXREF),trim(A.CPERRVTX))
from land_estruturais.cat01_taxref A
where A.data_date_part > date_add(TO_DATE(FROM_UNIXTIME(UNIX_TIMESTAMP())),-5)
and length(nvl(concat(trim(A.CTXREF),trim(A.CPERRVTX)),'')) <> 0
)
)
END
)
;
Below is a simplified view of the same query:
select *
FROM T1
where
(--first criteria
)
and
--problem starts here.....
(case ucase(trim(CTIPTXFX)) --Values 'FIX';'VAR';'PUR'
WHEN 'FIX'
THEN
(field1 not in
(subquery 1)
)
WHEN 'VAR'
THEN
(field1 not in
(subquery 2)
END
)
;
Can anyone tell me what I'm doing wrong, please?
I seems to me that Impala does not support the subqueries inside a Case When Statement.
Thank you.

Impala doesnt support Subqueries in the select list.
So, you need to rewrite the SQL like below -
Use LEFT ANTI JOIN in place of NOT IN() to link subqueries to T1.
To handle case when, use UNION ALL for different conditions.
SELECT * FROM T1
LEFT ANTI JOIN subqry1 y ON T1.id = y.id
WHERE col='FIX'
UNION ALL
SELECT * FROM T1
LEFT ANTI JOIN subqry2 y ON T1.id = y.id
WHERE col='VAR'
I tried to change the simple SQL you posted above. The main SQL is too complex and need table setup and data to prove the logic.
Here is my version of your simple SQL -
select * FROM T1
LEFT ANTI JOIN subquery1 ON subquery1.column = T1.field1
where (--first criteria )
and ucase(trim(CTIPTXFX))='FIX'
UNION ALL
select * FROM T1
LEFT ANTI JOIN subquery2 ON subquery2.column = T1.field1
where (--first criteria )
and ucase(trim(CTIPTXFX))='VAR'
Pls note, Anti join and union all can be expensive so if your table size if huge, please tune them accordingly.

Related

Oracle SQL Developer - error when referencing fields within nested from statements

For the below query I am getting an error with line 4 when referencing variables within "y". The query runs successfully when I use just " y.* " (line 5), however it generates an error when I try to also pull from the specified fields in line 4 (y.field1 as PRODUCT, y.field2 as PRODUCT_TYPE, y.entity, y.TYPE1). For the output, I want these fields listed first for visual reference.
I have this approach/ logic working for other queries (as i'm re using this logic for multiple variations of queries and various tables). However, I think that the issue with this one lies in my attempt to reference fields from tables that are in my join statements.
(
select
-- categorization fields:
-- table2.field1 as PRODUCT, table2.field2 as PRODUCT_TYPE, table3.entity, table3.TYPE1
y.field1 as PRODUCT,
y.field2 as PRODUCT_TYPE,
y.entity,
y.TYPE1
,y.*
from (
select *
from (
-- table references:
select table1.*,
row_number() over (
partition by
-- categorization fields:
table2.field1,
table2.field2,
table3.entity,
table3.TYPE1
order by table3.entity
) as rn
-- table references
from table1
-- joins, links, and filtering:
inner join table6 on table1.field_1 = table6.code1
inner join table5 on (table6.code = table5.code1)
AND (table6.code = table5.code)
left join table3 on table6.ent1 = table3.ent_code
left join table2 on table1.extid = table2.extID
where table1.tdate between '01-APR-19' and '01-APR-21'
AND table1.refe NOT IN ('OFF')
) x
-- sample rows:
where rn <= 2
) y
);
Let me know if anyone has a way that I can maybe better specify which tables those fields come from. I wish I could just do something like this:
y.table2.field1 as PRODUCT,
y.table2.field2 as PRODUCT_TYPE,
y.table3.entity,
y.table3.TYPE1
Sorry that I don't have a fiddle available!
Let me know if anyone has a way that I can maybe better specify which tables those fields come from.
Don't use select *. Instead, use the column names and give them appropriate aliases so you know where they came from:
As an example:
SELECT small_value,
medium_value,
big_value
FROM (
SELECT small.value AS small_value,
medium.value AS medium_value,
big.value AS big_value
FROM big
CROSS JOIN medium
CROSS JOIN small
)
WHERE 1 = 1
In your query, instead of using SELECT * in y or using SELECT table1.* in x you can name the columns and give them descriptive aliases.
I am getting an error with line 4 when referencing variables within "y".
(
select
-- categorization fields:
-- table2.field1 as PRODUCT, table2.field2 as PRODUCT_TYPE, table3.entity, table3.TYPE1
That is because you cannot see TABLE2 or TABLE3 because the only "view" you are looking at is of the sub-query with the alias y.
If you want to see those columns then you need to SELECT them inside the x subquery and pass them to each subsequent outer-query.
(
select *
from (
-- table references:
select table1.field1 AS t1_product,
table1.field2 AS t1_product_type,
table1.entity AS t1_entity,
table1.type1 AS t1_type1,
table2.field1 AS t2_product,
table2.field2 AS t2_product_type,
table2.entity AS t2_entity,
table2.type1 AS t2_type1,
table3.field1 AS t3_product,
table3.field2 AS t3_product_type,
table3.entity AS t3_entity,
table3.type1 AS t3_type1,
row_number() over (
partition by
-- categorization fields:
table2.field1,
table2.field2,
table3.entity,
table3.TYPE1
order by table3.entity
) as rn
-- table references
from table1
-- joins, links, and filtering:
inner join table6 on table1.field_1 = table6.code1
inner join table5 on (table6.code = table5.code1)
AND (table6.code = table5.code)
left join table3 on table6.ent1 = table3.ent_code
left join table2 on table1.extid = table2.extID
where table1.tdate between '01-APR-19' and '01-APR-21'
AND table1.refe NOT IN ('OFF')
) x
-- sample rows:
where rn <= 2
);

rewrite query replacing not in

I want to re-write the below query in a way to remove NOT IN is that possible ?
select * from TRX_T TT, TRX_SUB TS
where TT.CODE=TS.CODE
and TT.SUBID= TS.ID
and TS.VALUE=1
and TS.CODE=1
AND TS.ID=17
AND TT.STATUS NOT IN('T','R','C')
If the status was IN I would then use union all
The reason I want to re-write because the below oracle recommendation.
The predicate "TT"."STATUS"<>'C' used at line ID 5 of the execution
plan contains an expression on indexed column "STATUS". This
expression prevents the optimizer from efficiently using indices on
table TT.
Number of Distinct values
T 264
C 5489709
D 2987
J 924
L 529430
R 39382
S 5449
The index on TRX_T is like this: (CODE,SUBID,TYPE,STATUS,NO_SL)
If you want to tune a query you need to understand your data model and your data. The optimiser says it cannot efficiently use an index on TRX_T. Let's look at that compound index:
CODE : used in join criteria
SUBID : used in join criteria
TYPE : not used
STATUS : could be used as filter ?
NO_SL : not used
Your query uses three of the five indexed columns. But because you have a NOT IN expression on STATUS the optimiser doesn't use the index to evaluate the filter. So it reads every record in TRX_T which matches a record in TRX_SUB and evaluates the filter on the table.
Perhaps if you expressed the condition positively as TT.STATUS IN ('D','J','L', 'S') then the optimizer might be able to use a SKIP SCAN to evaluate the filter on the index.
However, the index usage would be more efficient if the TRX_T.TYPE were used as filter ( or if the order of the index columns were re-arranged to have STATUS before TYPE but don't do this as it might destabilise other queries).
Another option would be to rewrite the expression as a NOT IN subquery (if you have no null values in (TRX_T.CODE, TRX_T.SUBID) otherwise as a NOT EXISTS subquery):
select * from TRX_T TT, TRX_SUB TS
where TT.CODE=TS.CODE
and TT.SUBID= TS.ID
and TS.VALUE=1
and TS.CODE=1
AND TS.ID=17
AND (TT.CODE, TT.SUBID) NOT IN
(select x.CODE, x.SUBID
from trx_t x
where x.status in ('T','R','C')
)
However, the number of TRX_T records which have STATUS values in that list is very large - they are the majority of your table - so evaluating that subquery might well be more expensive than what you have at the moment.
Please note, the usual caveats apply. Tuning queries on StackOverflow is a mug's game. Too much information is missing (data volumes, skew, other indexes, explain plans, etc) for us to to do anything other than guess.
use explicit join and not in could be replaced by below way
select * from TRX_T TT
join TRX_SUB TS
on TT.CODE=TS.CODE and TT.SUBID= TS.ID
where TS.VALUE=1
and TS.CODE=1
AND TS.ID=17
and not exists
( select 1 from TRX_T t1 where t1.CODE=TT.code
and (t1.STATUS ='T' OR t1.STATUS='R' or t1.STATUS='C')
)
You can try using left join and filter out TT2.status is null so it will give you those records which are not in ('T','R','C')
select * from TRX_T TT
join TRX_SUB TS on TT.CODE=TS.CODE and TT.SUBID= TS.ID
left join (select * from TRX_T where STATUS in ('T','R','C')) TT2 on
TT.CODE=TT2.CODE and TT.SUBID= TT2.SUBID
where TS.VALUE=1 and TS.CODE=1 AND TS.ID=17 and TT2.status is null
You can use minus set operator as
select *
from TRX_T TT
join TRX_SUB TS
on ( TT.CODE = TS.CODE
and TT.SUBID = TS.ID )
where TS.VALUE = 1
and TS.CODE = 1
and TS.ID = 17
minus
select *
from TRX_T TT
join TRX_SUB TS
on ( TT.CODE = TS.CODE
and TT.SUBID = TS.ID )
where TS.VALUE = 1
and TS.CODE = 1
and TS.ID = 17
and TT.STATUS in ('T', 'R', 'C');
P.S. Yes, Not in is mostly problematic from the performance view, and as a side not using nvl function shouldn't be forgotten when using Not in.
Try the below, you can use 'except':
SELECT *
FROM trx_t TT,
trx_sub TS
WHERE TT.code = TS.code
AND TT.subid = TS.id
AND TS.value = 1
AND TS.code = 1
AND TS.id = 17
EXCEPT
SELECT *
FROM trx_t TT,
trx_sub TS
WHERE TT.status = 'T'
OR TT.status = 'R'
OR TT.status = 'C'

Check if a combination of fields already exists in the table

My weakest area of SQL are self JOINS, currently struggling with an issue.
I need to find the latest entry in a table, I'm using a WHERE DATEFIELD IN (SELECT MAX(DATEFIELD) FROM TABLE) to do this. I then need to establish if 3 columns from that already exist in the same TABLE.
My latest attempt looks like this -
SELECT * FROM PART_TABLE
WHERE NOT EXISTS
(
SELECT
t1.DATEFIELD
t1.CODE1
t1.CODE2
t1.CODE3
FROM PART_TABLE t1
INNER JOIN PART_TABLE t2 ON t1.UNIQUE = t2.UNQIUE
)
WHERE t1.DATEFIELD IN
(
SELECT MAX(DATEFIELD)
FROM PARTTABLE
)
)
I think part of the issue is that I can't exclude the unique row from t1 when checking in t2 using this method.
Using MSSQL 2014.
The following query will return the latest record from your table and a bit flag whether a duplicate tuple {Code1, Code2, Code3} exists in it under a different identifier:
select top (1) p.*,
case when exists (
select 0 from dbo.Part_Table t where t.Unique != p.Unique
and t.Code1 = p.Code1 and t.Code2 = p.Code2 and t.Code3 = p.Code3
) then 1
else 0 end as [IsDuplicateExists]
from dbo.Part_Table p
order by p.DateField desc;
You can use this example as a template to address your specific needs, which unfortunately aren't immediately apparent from your explanation.

Oracle SQL XOR condition with > 14 tables

I have a question on sql desgin.
Context:
I have a table called t_master and 13 other tables (lets call them a,b,c... for simplicity) where it needs to compared.
Logic:
t_master will be compared to table 'a' where t_master.gen_val =
a.value.
If record exist in t_master, retrieve t_master record, else retrieve 'a' record.
I do not need to retrieve the records if it exists in both tables (t_master and a) - XOR condition
Repeat this comparison with the remaining 12 tables.
I have some idea on doing this, using WITH to subquery the non-master tables (a,b,c...) first with their respective WHERE clause.
Then use XOR statement to retrieve the records.
Something like
WITH a AS (SELECT ...),
b AS (SELECT ...)
SELECT field1,field2...
FROM t_master FULL OUTER JOIN a FULL OUTER JOIN b FULL OUTER JOIN c...
ON t_master.gen_value = a.value
WHERE ((field1 = x OR field2 = y ) AND NOT (field1 = x AND field2 = y))
AND ....
.
.
.
.
Seeing that I have 13 tables that I need to full outer join, is there a better way/design to handle this?
Otherwise I would have at least 2*13 lines of WHERE clause which I'm not sure if that will have impact on the performance as t_master is sort of a log table.
**Assume I cant change any schema.
Currently I'm not sure if this SQL will working correctly yet, so I'm hoping someone can guide me in the right direction regarding this.
update from used_by_already's suggestion:
This is what I'm trying to do (comparison between 2 tables first, before I add more, but I am unable to get values from ATP_R.TBL_HI_HDR HI_HDR as it is in the NOT EXISTS subquery.
How do i overcome this?
SELECT LOG_REPO.UNIQ_ID,
LOG_REPO.REQUEST_PAYLOAD,
LOG_REPO.GEN_VAL,
LOG_REPO.CREATED_BY,
TO_CHAR(LOG_REPO.CREATED_DT,'DD/MM/YYYY') AS CREATED_DT,
HI_HDR.HI_NO R_VALUE,
HI_HDR.CREATED_BY R_CREATED_BY,
TO_CHAR(HI_HDR.CREATED_DT,'DD/MM/YYYY') AS R_CREATED_DT
FROM ATP_COMMON.VW_CMN_LOG_GEN_REPO LOG_REPO JOIN ATP_R.TBL_HI_HDR HI_HDR ON LOG_REPO.GEN_VAL = HI_HDR.HI_NO
WHERE NOT EXISTS
(SELECT NULL
FROM ATP_R.TBL_HI_HDR HI_HDR
WHERE LOG_REPO.GEN_VAL = HI_HDR.HI_NO
)
UNION ALL
SELECT LOG_REPO.UNIQ_ID,
LOG_REPO.REQUEST_PAYLOAD,
LOG_REPO.GEN_VAL,
LOG_REPO.CREATED_BY,
TO_CHAR(LOG_REPO.CREATED_DT,'DD/MM/YYYY') AS CREATED_DT,
HI_HDR.HI_NO R_VALUE,
HI_HDR.CREATED_BY R_CREATED_BY,
TO_CHAR(HI_HDR.CREATED_DT,'DD/MM/YYYY') AS R_CREATED_DT
FROM ATP_R.TBL_HI_HDR HI_HDR JOIN ATP_COMMON.VW_CMN_LOG_GEN_REPO LOG_REPO ON HI_HDR.HI_NO = LOG_REPO.GEN_VAL
WHERE NOT EXISTS
(SELECT NULL
FROM ATP_COMMON.VW_CMN_LOG_GEN_REPO LOG_REPO
WHERE HI_HDR.HI_NO = LOG_REPO.GEN_VAL
)
Full outer joins used to exclude all matching rows can be an expensive query. You don't supply much detail, but perhaps using NOT EXISTS would be simpler and maybe it will produce a better explain plan. Something along these lines.
select
cola,colb,colc
from t_master m
where not exists (
select null from a where m.keycol = a.fk_to_m
)
and not exists (
select null from b where m.keycol = b.fk_to_m
)
and not exists (
select null from c where m.keycol = c.fk_to_m
)
union all
select
cola,colb,colc from a
where not exists (
select null from t_master m where a.fk_to_m = m.keycol
)
union all
select
cola,colb,colc from b
where not exists (
select null from t_master m where b.fk_to_m = m.keycol
)
union all
select
cola,colb,colc from c
where not exists (
select null from t_master m where c.fk_to_m = m.keycol
)
You could union the 13 a,b,c ... tables to simplify the coding, but that may not perform so well.

How to convert SUBSELECT with TOP and ORDER BY to JOIN

I have a working sql select, which looks like this
[Edited: Im sorry i did one mistake in the question, i edited alias of Table1 but im trying the answers]
SELECT
m.Column1
,t2.Column2
,COALESCE
(
(
SELECT TOP 1 Vat
FROM LinkedDBServer.DatabaseName.dbo.TableName t3
WHERE
m.MaterialNumber = t3.MaterialNumber COLLATE Czech_CI_AS
and t3.Currency = …
and ...
ORDER BY [Date] DESC
), m.Vat
) as Vat
FROM Table1 m
JOIN Table2 t2 on (m.Column1 = t2.Column1)
It works but the problem is that it takes too long and LinkedServer cut my connection because it takes more than 10 minutes. The purpose of the query is to get newer data from a different database if it exists (i get newest data by top and ordering it by date and precondition is that every data in that database is newer than in mine, thats why im using COALESCE).
But my though is if I was able to rewrite it to JOIN it could be faster. But another problem could be I dont have an primary key (and cant change that).
How can I speed that query up ? (Im using SQL Server 2008 R2)
Thank you
Here i attached Estimated Query Plan: (Its readable in browser ZOOM :) Estimation is for 2 Coalesce columns.
Try rewriting query using outer apply
SELECT
t1.Column1
,t2.Column2
,COALESCE(ou.vat, m.Vat) as Vat
FROM Table1 t1
JOIN Table2 m on (m.Column1 = t1.Column1)
outer apply
(
SELECT TOP 1 Vat
FROM LinkedDBServer.DatabaseName.dbo.TableName t3
WHERE
m.MaterialNumber = t3.MaterialNumber COLLATE Czech_CI_AS
and t3.Currency = …
and ...
ORDER BY [Date] DESC
) ou
Another option:
; WITH vat AS (
SELECT MaterialNumber COLLATE Czech_CI_AS As MaterialNumber
, Vat
, Row_Number() OVER (PARTITION BY MaterialNumber ORDER BY "Date" DESC) As sequence
FROM LinkedDBServer.DatabaseName.dbo.TableName
WHERE Currency = ...
AND ...
)
SELECT t1.Column1
, m.Column2
, Coalesce(vat.Vat, m.Vat) As Vat
FROM Table1 As t1
INNER
JOIN Table2 As m
ON m.Column1 = t1.Column1
LEFT
JOIN vat
ON vat.MaterialNumber = m.MaterialNumber
AND vat.sequence = 1
;