rewrite query replacing not in - sql

I want to re-write the below query in a way to remove NOT IN is that possible ?
select * from TRX_T TT, TRX_SUB TS
where TT.CODE=TS.CODE
and TT.SUBID= TS.ID
and TS.VALUE=1
and TS.CODE=1
AND TS.ID=17
AND TT.STATUS NOT IN('T','R','C')
If the status was IN I would then use union all
The reason I want to re-write because the below oracle recommendation.
The predicate "TT"."STATUS"<>'C' used at line ID 5 of the execution
plan contains an expression on indexed column "STATUS". This
expression prevents the optimizer from efficiently using indices on
table TT.
Number of Distinct values
T 264
C 5489709
D 2987
J 924
L 529430
R 39382
S 5449
The index on TRX_T is like this: (CODE,SUBID,TYPE,STATUS,NO_SL)

If you want to tune a query you need to understand your data model and your data. The optimiser says it cannot efficiently use an index on TRX_T. Let's look at that compound index:
CODE : used in join criteria
SUBID : used in join criteria
TYPE : not used
STATUS : could be used as filter ?
NO_SL : not used
Your query uses three of the five indexed columns. But because you have a NOT IN expression on STATUS the optimiser doesn't use the index to evaluate the filter. So it reads every record in TRX_T which matches a record in TRX_SUB and evaluates the filter on the table.
Perhaps if you expressed the condition positively as TT.STATUS IN ('D','J','L', 'S') then the optimizer might be able to use a SKIP SCAN to evaluate the filter on the index.
However, the index usage would be more efficient if the TRX_T.TYPE were used as filter ( or if the order of the index columns were re-arranged to have STATUS before TYPE but don't do this as it might destabilise other queries).
Another option would be to rewrite the expression as a NOT IN subquery (if you have no null values in (TRX_T.CODE, TRX_T.SUBID) otherwise as a NOT EXISTS subquery):
select * from TRX_T TT, TRX_SUB TS
where TT.CODE=TS.CODE
and TT.SUBID= TS.ID
and TS.VALUE=1
and TS.CODE=1
AND TS.ID=17
AND (TT.CODE, TT.SUBID) NOT IN
(select x.CODE, x.SUBID
from trx_t x
where x.status in ('T','R','C')
)
However, the number of TRX_T records which have STATUS values in that list is very large - they are the majority of your table - so evaluating that subquery might well be more expensive than what you have at the moment.
Please note, the usual caveats apply. Tuning queries on StackOverflow is a mug's game. Too much information is missing (data volumes, skew, other indexes, explain plans, etc) for us to to do anything other than guess.

use explicit join and not in could be replaced by below way
select * from TRX_T TT
join TRX_SUB TS
on TT.CODE=TS.CODE and TT.SUBID= TS.ID
where TS.VALUE=1
and TS.CODE=1
AND TS.ID=17
and not exists
( select 1 from TRX_T t1 where t1.CODE=TT.code
and (t1.STATUS ='T' OR t1.STATUS='R' or t1.STATUS='C')
)

You can try using left join and filter out TT2.status is null so it will give you those records which are not in ('T','R','C')
select * from TRX_T TT
join TRX_SUB TS on TT.CODE=TS.CODE and TT.SUBID= TS.ID
left join (select * from TRX_T where STATUS in ('T','R','C')) TT2 on
TT.CODE=TT2.CODE and TT.SUBID= TT2.SUBID
where TS.VALUE=1 and TS.CODE=1 AND TS.ID=17 and TT2.status is null

You can use minus set operator as
select *
from TRX_T TT
join TRX_SUB TS
on ( TT.CODE = TS.CODE
and TT.SUBID = TS.ID )
where TS.VALUE = 1
and TS.CODE = 1
and TS.ID = 17
minus
select *
from TRX_T TT
join TRX_SUB TS
on ( TT.CODE = TS.CODE
and TT.SUBID = TS.ID )
where TS.VALUE = 1
and TS.CODE = 1
and TS.ID = 17
and TT.STATUS in ('T', 'R', 'C');
P.S. Yes, Not in is mostly problematic from the performance view, and as a side not using nvl function shouldn't be forgotten when using Not in.

Try the below, you can use 'except':
SELECT *
FROM trx_t TT,
trx_sub TS
WHERE TT.code = TS.code
AND TT.subid = TS.id
AND TS.value = 1
AND TS.code = 1
AND TS.id = 17
EXCEPT
SELECT *
FROM trx_t TT,
trx_sub TS
WHERE TT.status = 'T'
OR TT.status = 'R'
OR TT.status = 'C'

Related

Impala Error : AnalysisException: Multiple subqueries are not supported in expression: (CASE *** WHEN THEN)

I have a where Clause that I need to check if values exists in a table, and I'm doing that in a (subquery). The problem is, that should be made based on
values - 'FIX' and 'VAR'. Depending on each, we need to check on a different table (subquery). To achieve that goal I'm using a Case When statement in the where clause, as shown below:
select *
FROM T1
where
(upper(trim(ITAXAVAR)) = 'S'
and
(
upper(trim(CTIPAMOR)) not in ('A','U','F')
)
)
and
--problem starts here.....
(case ucase(trim(CTIPTXFX)) --Values 'FIX';'VAR';'PUR'
WHEN 'FIX'
THEN
(concat(trim(CPRZTXFX),trim(CTAXAREF)) not in
(select trim(A.tayd91c0_celemtab)
from cd_estruturais.tat91_tabelas A
where A.tayd91c0_ctabela = 'W03' and
--data_date_part = '${Data_ref}' and --por vezes não temos actualização TAT91 para mesma data_ref das tabelas
A.data_date_part = (select max(B.data_date_part)
from cd_estruturais.tat91_tabelas B
where A.tayd91c0_ctabela = B.tayd91c0_ctabela and
B.data_date_part > date_add(TO_DATE(FROM_UNIXTIME(UNIX_TIMESTAMP())),-5)
)
and length(nvl(trim(A.tayd91c0_celemtab),'')) <> 0
)
)
WHEN 'VAR'
THEN
(concat(trim(CTAXAREF),trim(CPERRVTX)) not in
(select concat(trim(A.CTXREF),trim(A.CPERRVTX))
from land_estruturais.cat01_taxref A
where A.data_date_part > date_add(TO_DATE(FROM_UNIXTIME(UNIX_TIMESTAMP())),-5)
and length(nvl(concat(trim(A.CTXREF),trim(A.CPERRVTX)),'')) <> 0
)
)
END
)
;
Below is a simplified view of the same query:
select *
FROM T1
where
(--first criteria
)
and
--problem starts here.....
(case ucase(trim(CTIPTXFX)) --Values 'FIX';'VAR';'PUR'
WHEN 'FIX'
THEN
(field1 not in
(subquery 1)
)
WHEN 'VAR'
THEN
(field1 not in
(subquery 2)
END
)
;
Can anyone tell me what I'm doing wrong, please?
I seems to me that Impala does not support the subqueries inside a Case When Statement.
Thank you.
Impala doesnt support Subqueries in the select list.
So, you need to rewrite the SQL like below -
Use LEFT ANTI JOIN in place of NOT IN() to link subqueries to T1.
To handle case when, use UNION ALL for different conditions.
SELECT * FROM T1
LEFT ANTI JOIN subqry1 y ON T1.id = y.id
WHERE col='FIX'
UNION ALL
SELECT * FROM T1
LEFT ANTI JOIN subqry2 y ON T1.id = y.id
WHERE col='VAR'
I tried to change the simple SQL you posted above. The main SQL is too complex and need table setup and data to prove the logic.
Here is my version of your simple SQL -
select * FROM T1
LEFT ANTI JOIN subquery1 ON subquery1.column = T1.field1
where (--first criteria )
and ucase(trim(CTIPTXFX))='FIX'
UNION ALL
select * FROM T1
LEFT ANTI JOIN subquery2 ON subquery2.column = T1.field1
where (--first criteria )
and ucase(trim(CTIPTXFX))='VAR'
Pls note, Anti join and union all can be expensive so if your table size if huge, please tune them accordingly.

Performance Issue in Left outer join Sql server

In my project I need find difference task based on old and new revision in the same table.
id | task | latest_Rev
1 A N
1 B N
2 C Y
2 A Y
2 B Y
Expected Result:
id | task | latest_Rev
2 C Y
So I tried following query
Select new.*
from Rev_tmp nw with (nolock)
left outer
join rev_tmp old with (nolock)
on nw.id -1 = old.id
and nw.task = old.task
and nw.latest_rev = 'y'
where old.task is null
when my table have more than 20k records this query takes more time?
How to reduce the time?
In my company don't allow to use subquery
Use LAG function to remove the self join
SELECT *
FROM (SELECT *,
CASE WHEN latest_Rev = 'y' THEN Lag(latest_Rev) OVER(partition BY task ORDER BY id) ELSE NULL END AS prev_rev
FROM Rev_tmp) a
WHERE prev_rev IS NULL
My answer assumes
You can't change the indexes
You can't use subqueries
All fields are indexed separately
If you look at the query, the only value that really reduces the resultset is latest_rev='Y'. If you were to eliminate that condition, you'd definitely get a table scan. So we want that condition to be evaluated using an index. Unfortunately a field that just values 'Y' and 'N' is likely to be ignored because it will have terrible selectivity. You might get better performance if you coax SQL Server into using it anyway. If the index on latest_rev is called idx_latest_rev then try this:
Set transaction isolated level read uncommitted
Select new.*
from Rev_tmp nw with (index(idx_latest_rev))
left outer
join rev_tmp old
on nw.id -1 = old.id
and nw.task = old.task
where old.task is null
and nw.latest_rev = 'y'
latest_Rev should be a Bit type (boolean equivalent), i better for performance (Detail here)
May be can you add index on id, task
, latest_Rev columns
You can try this query (replace left outer by not exists)
Select *
from Rev_tmp nw
where nw.latest_rev = 'y' and not exists
(
select * from rev_tmp old
where nw.id -1 = old.id and nw.task = old.task
)

SQL self-join optimization

I am writing a query that requires a self-join of a large table (> 1 million rows)
I'm only interested in the rows that were created today which I can filter using a recording_time column that contains the epoch time.
However, I'm not certain that the below query is actually limiting the tables BEFORE doing the join.
SELECT B.ani
FROM [app].[dbo].[recordings] B
INNER JOIN [app].[dbo].[recordings] A
ON B.callid = A.callid AND B.dnis = A.ani
where A.filename LIKE '%680627.wav'
AND B.recording_time > 1485340000
Filter rows that were created today and use that new table to join.
SELECT B.ani
FROM ( SELECT * FROM [app].[dbo].[recordings] where recording_time > 1485340000 ) B
INNER JOIN ( SELECT * FROM [app].[dbo].[recordings] where recording_time > 1485340000 ) A
ON B.callid = A.callid AND B.dnis = A.ani
where A.filename LIKE '%680627.wav'

Eliminating "not in" for SQL command

I would like to take a sample of an Oracle table, but not include entries from another table. I have a query that currently works, but I'm pretty sure it will blow-up when the sub-select gets more than 1000 records.
select user_key from users sample(5)
where active_flag = 'Y'
and user_key not in (
select user_key from user_validation where validation_state <> 'expired'
);
How could this be re-written without the not in. I thought of using minus, but then my sample size would keep going down as new entries were added to the user_validation table.
You can do this with a left outer join:
select *
from (select u.user_key,
count(*) over () as numrecs
from users u left outer join
user_validation uv
on u.user_key = uv.user_key and
uv.validation_state <> 'expired'
where u.active_flag = 'Y' and uv.user_key is null
) t
where rownum <= numrecs * 0.05
You are using the sample clause. It is not clear if you just want the non-matches in the 5% you choose or if you want 5% of the data that is non-matches. This is the latter.
EDIT: Added example based on author's comment:
select user_key from (
select u.user_key, row_number() over (order by dbms_random.value) as randval
from users u
left outer join user_validation uv
on u.user_key = uv.user_key
and uv.validation_state <> 'expired'
where u.active_flag = 'Y'
and uv.user_key is null
) myrandomjoin where randval <=100;
select us.user_key
from users us -- sample(5)
where us.active_flag = 'Y'
and NOT EXISTS (
SELECT *
from user_validation nx
where nx.user_key = us.user_key
AND nx.validation_state <> 'expired'
);
BTW: I commented-out the sample(5) because I don't know what it means. (I strongly believe that it is not relevant, though)
select u.user_key from users u, user_validation uv
where u.active_flag = 'Y'
and u.user_key=uv.user_key
uv.validation_state= 'expired';
This was a double negation query, x not in list of non expired ids, which is equivalent to x is in the list of expired IDs, which is what I did, in addition to changing the subquery to a join.

Limit join to one row

I have the following query:
SELECT sum((select count(*) as itemCount) * "SalesOrderItems"."price") as amount, 'rma' as
"creditType", "Clients"."company" as "client", "Clients".id as "ClientId", "Rmas".*
FROM "Rmas" JOIN "EsnsRmas" on("EsnsRmas"."RmaId" = "Rmas"."id")
JOIN "Esns" on ("Esns".id = "EsnsRmas"."EsnId")
JOIN "EsnsSalesOrderItems" on("EsnsSalesOrderItems"."EsnId" = "Esns"."id" )
JOIN "SalesOrderItems" on("SalesOrderItems"."id" = "EsnsSalesOrderItems"."SalesOrderItemId")
JOIN "Clients" on("Clients"."id" = "Rmas"."ClientId" )
WHERE "Rmas"."credited"=false AND "Rmas"."verifyStatus" IS NOT null
GROUP BY "Clients".id, "Rmas".id;
The problem is that the table "EsnsSalesOrderItems" can have the same EsnId in different entries. I want to restrict the query to only pull the last entry in "EsnsSalesOrderItems" that has the same "EsnId".
By "last" entry I mean the following:
The one that appears last in the table "EsnsSalesOrderItems". So for example if "EsnsSalesOrderItems" has two entries with "EsnId" = 6 and "createdAt" = '2012-06-19' and '2012-07-19' respectively it should only give me the entry from '2012-07-19'.
SELECT (count(*) * sum(s."price")) AS amount
, 'rma' AS "creditType"
, c."company" AS "client"
, c.id AS "ClientId"
, r.*
FROM "Rmas" r
JOIN "EsnsRmas" er ON er."RmaId" = r."id"
JOIN "Esns" e ON e.id = er."EsnId"
JOIN (
SELECT DISTINCT ON ("EsnId") *
FROM "EsnsSalesOrderItems"
ORDER BY "EsnId", "createdAt" DESC
) es ON es."EsnId" = e."id"
JOIN "SalesOrderItems" s ON s."id" = es."SalesOrderItemId"
JOIN "Clients" c ON c."id" = r."ClientId"
WHERE r."credited" = FALSE
AND r."verifyStatus" IS NOT NULL
GROUP BY c.id, r.id;
Your query in the question has an illegal aggregate over another aggregate:
sum((select count(*) as itemCount) * "SalesOrderItems"."price") as amount
Simplified and converted to legal syntax:
(count(*) * sum(s."price")) AS amount
But do you really want to multiply with the count per group?
I retrieve the the single row per group in "EsnsSalesOrderItems" with DISTINCT ON. Detailed explanation:
Select first row in each GROUP BY group?
I also added table aliases and formatting to make the query easier to parse for human eyes. If you could avoid camel case you could get rid of all the double quotes clouding the view.
Something like:
join (
select "EsnId",
row_number() over (partition by "EsnId" order by "createdAt" desc) as rn
from "EsnsSalesOrderItems"
) t ON t."EsnId" = "Esns"."id" and rn = 1
this will select the latest "EsnId" from "EsnsSalesOrderItems" based on the column creation_date. As you didn't post the structure of your tables, I had to "invent" a column name. You can use any column that allows you to define an order on the rows that suits you.
But remember the concept of the "last row" is only valid if you specifiy an order or the rows. A table as such is not ordered, nor is the result of a query unless you specify an order by
Necromancing because the answers are outdated.
Take advantage of the LATERAL keyword introduced in PG 9.3
left | right | inner JOIN LATERAL
I'll explain with an example:
Assuming you have a table "Contacts".
Now contacts have organisational units.
They can have one OU at a point in time, but N OUs at N points in time.
Now, if you have to query contacts and OU in a time period (not a reporting date, but a date range), you could N-fold increase the record count if you just did a left join.
So, to display the OU, you need to just join the first OU for each contact (where what shall be first is an arbitrary criterion - when taking the last value, for example, that is just another way of saying the first value when sorted by descending date order).
In SQL-server, you would use cross-apply (or rather OUTER APPLY since we need a left join), which will invoke a table-valued function on each row it has to join.
SELECT * FROM T_Contacts
--LEFT JOIN T_MAP_Contacts_Ref_OrganisationalUnit ON MAP_CTCOU_CT_UID = T_Contacts.CT_UID AND MAP_CTCOU_SoftDeleteStatus = 1
--WHERE T_MAP_Contacts_Ref_OrganisationalUnit.MAP_CTCOU_UID IS NULL -- 989
-- CROSS APPLY -- = INNER JOIN
OUTER APPLY -- = LEFT JOIN
(
SELECT TOP 1
--MAP_CTCOU_UID
MAP_CTCOU_CT_UID
,MAP_CTCOU_COU_UID
,MAP_CTCOU_DateFrom
,MAP_CTCOU_DateTo
FROM T_MAP_Contacts_Ref_OrganisationalUnit
WHERE MAP_CTCOU_SoftDeleteStatus = 1
AND MAP_CTCOU_CT_UID = T_Contacts.CT_UID
/*
AND
(
(#in_DateFrom <= T_MAP_Contacts_Ref_OrganisationalUnit.MAP_KTKOE_DateTo)
AND
(#in_DateTo >= T_MAP_Contacts_Ref_OrganisationalUnit.MAP_KTKOE_DateFrom)
)
*/
ORDER BY MAP_CTCOU_DateFrom
) AS FirstOE
In PostgreSQL, starting from version 9.3, you can do that, too - just use the LATERAL keyword to achieve the same:
SELECT * FROM T_Contacts
--LEFT JOIN T_MAP_Contacts_Ref_OrganisationalUnit ON MAP_CTCOU_CT_UID = T_Contacts.CT_UID AND MAP_CTCOU_SoftDeleteStatus = 1
--WHERE T_MAP_Contacts_Ref_OrganisationalUnit.MAP_CTCOU_UID IS NULL -- 989
LEFT JOIN LATERAL
(
SELECT
--MAP_CTCOU_UID
MAP_CTCOU_CT_UID
,MAP_CTCOU_COU_UID
,MAP_CTCOU_DateFrom
,MAP_CTCOU_DateTo
FROM T_MAP_Contacts_Ref_OrganisationalUnit
WHERE MAP_CTCOU_SoftDeleteStatus = 1
AND MAP_CTCOU_CT_UID = T_Contacts.CT_UID
/*
AND
(
(__in_DateFrom <= T_MAP_Contacts_Ref_OrganisationalUnit.MAP_KTKOE_DateTo)
AND
(__in_DateTo >= T_MAP_Contacts_Ref_OrganisationalUnit.MAP_KTKOE_DateFrom)
)
*/
ORDER BY MAP_CTCOU_DateFrom
LIMIT 1
) AS FirstOE
Try using a subquery in your ON clause. An abstract example:
SELECT
*
FROM table1
JOIN table2 ON table2.id = (
SELECT id FROM table2 WHERE table2.table1_id = table1.id LIMIT 1
)
WHERE
...