Where in (sub query) or (list) performance issues - sql

I'd like to make a list based on whether a field in the original table is in two lists. My code is thus:
SELECT *
FROM ListofPlaces
WHERE Property = 'MODERATE'
and (HOMELAND in (
SELECT distinct HOMELAND
FROM PLANS
WHERE left(plans.code, 1) = '1')
or HOMELAND in (
'PlaceA'
, 'PlaceB'
, 'PlaceC'
, 'PlaceD'
, 'PlaceE'))
The list and the sub query will work fine individually, taking 00:00:01.43 for the sub query and 00:00:00.13 for the list, however they take around a min once combined.
I have tried using a left join, but this leads to a more significant reduction in performance.
The table 'PLANS' is a larger table of 4M+ rows, whilst list of places is less than 1000.
My question is whether I'm using the and/or operators efficiently, and if so, is there a more efficient way to run this query?

Try rewriting this using UNION:
SELECT *
FROM ListofPlaces
WHERE Property = 'MODERATE' AND
HOMELAND IN (SELECT HOMELAND
FROM PLANS
WHERE left(plans.code, 1) = '1'
)
UNION
SELECT *
FROM ListofPlaces
WHERE Property = 'MODERATE' AND
HOMELAND in ('PlaceA', 'PlaceB', 'PlaceC', 'PlaceD', 'PlaceE');
The optimizer can sometimes be confused by ORs. UNION may be needed here instead of UNION ALL if the two lists contain similar elements. Otherwise, if you know they are disjoint, use UNION ALL.

Why not join?
SELECT LoP.*
FROM ListofPlaces LoP
left join PLANS Pl
on LoP.HOMELAND = Pl.HOMELAND
WHERE Property = 'MODERATE'
and (left(plans.code, 1) = '1'
or LoP.HOMELAND in (
'PlaceA'
, 'PlaceB'
, 'PlaceC'
, 'PlaceD'
, 'PlaceE'))

Related

Is it possible to use UNION here instead of OR?

UPDATE: Changed title. Previous title "Does UNION instead of OR always speed up queries?"
Here is my query. The question is concerning the second last line with the OR:
SELECT distinct bigUnionQuery.customer
FROM ((SELECT buyer.customer
FROM membership_vw buyer
JOIN account_vw account
ON account.buyer = buyer.id
WHERE account.closedate >= 'some_date')
UNION
(SELECT joint.customer
FROM entity_vw joint
JOIN transactorassociation_vw assoc
ON assoc.associatedentity = joint.id
JOIN account_vw account
ON account.buyer = assoc.entity
WHERE assoc.account is null and account.closedate >= 'some_date')
UNION
(SELECT joint.customer
FROM entity_vw joint
JOIN transactorassociation_vw assoc
ON assoc.associatedentity = joint.id
JOIN account_vw account
ON account.id = assoc.account
WHERE account.closedate >= '2021-02-11 00:30:22.339'))
AS bigUnionQuery
JOIN entity_vw
ON entity_vw.customer = bigUnionQuery.customer OR entity_vw.id = bigUnionQuery.customer
WHERE entity_vw.lastmodifieddate >= 'some_date';
The original query doesn't have the OR in the second last line. Adding the OR here has slowed down the query. I'm wondering if there is a way to use UNION here to speed it up.
I tried doing (pseudo):
bigUnionQuery bq join entity_vw e on e.customer = bq.customer
union
bigUnionQuery bq join entity_vw e on e.id = bq.customer
But that slowed down the query even more, probably because the bigUnionQuery is a large, slow query, and running it twice in the UNION is not the correct way. What would be the right way to use UNION here, or is it always going to be faster with OR?
Does UNION instead of OR always speed up queries? In some cases it does. I think it depends on your indexes too. I have worked on tables with 1 million records and my queries' speed usually improves if I use union instead of 'or' or 'and'.

the below select statement takes a long in running

This select statement takes a long time running, after my investigation I found that the problem un subquery, stored procedure, please I appreciate your help.
SELECT DISTINCT
COKE_CHQ_NUMBER,
COKE_PAY_SUPPLIER
FROM
apps.Q_COKE_AP_CHECKS_SIGN_STATUS_V
WHERE
plan_id = 40192
AND COKE_SIGNATURE__A = 'YES'
AND COKE_SIGNATURE__B = 'YES'
AND COKE_AUDIT = 'YES'
AND COKE_CHQ_NUMBER NOT IN (SELECT DISTINCT COKE_CHQ_NUMBER_DELIVER
FROM apps.Q_COKE_AP_CHECKS_DELIVERY_ST_V
WHERE UPPER(COKE_CHQ_NUMBER_DELIVER_STATUS) <> 'DELIVERED')
AND COKE_CHQ_NUMBER NOT IN (SELECT COKE_CHQ_NUMBER_DELIVER
FROM apps.Q_COKE_AP_CHECKS_DELIVERY_ST_V)
Well there are a few issues with your SELECT statement that you should address:
First let's look at this condition:
COKE_CHQ_NUMBER NOT IN (SELECT DISTINCT COKE_CHQ_NUMBER_DELIVER
FROM apps.Q_COKE_AP_CHECKS_DELIVERY_ST_V
WHERE UPPER(COKE_CHQ_NUMBER_DELIVER_STATUS) <> 'DELIVERED')
First you select DISTINCT cheque numbers with a not delivered status then you say you don't want this. Rather than saying I don't want non delivered it is much more readable to say I want delivered ones. However this is not really an issue but rather it would make your SELECT easier to read and understand.
Second let's look at your second cheque condition:
COKE_CHQ_NUMBER NOT IN (SELECT COKE_CHQ_NUMBER_DELIVER
FROM apps.Q_COKE_AP_CHECKS_DELIVERY_ST_V)
Here you want to exclude all cheques that have an entry in Q_COKE_AP_CHECKS_DELIVERY_ST_V. This makes your first DISTINCT condition redundant as whatever cheques numbers will bring back would be rejected by this second condition of yours. I do't know if Oracle SQL engine is clever enough to work out this redundancy but this could cause your slowness as SELECT distinct can take longer to run
In addition to this if you don't have them already I would recommend adding the following indexes:
CREATE INDEX index_1 ON q_coke_ap_checks_sign_status_v(coke_chq_number, coke_pay_supplier);
CREATE INDEX index_2 ON q_coke_ap_checks_sign_status_v(plan_id, coke_signature__a, coke_signature__b, coke_audit);
CREATE INDEX index_3 ON q_coke_ap_checks_delivery_st_v(coke_chq_number_deliver);
I called the index_1,2,3 for easy to read obviously not a good naming convention.
With this in place your select should be optimized to retrieve you your data in an acceptable performance. But of course it all depends on the size and the distribution of your data which is hard to control without performing specific data analysis.
looking to you code .. seems you have redundant where condition the second NOT IN implies the firts so you could avoid
you could also transform you NOT IN clause in a MINUS clause .. join the same query with INNER join of you not in subquery
and last be careful you have proper composite index on table
Q_COKE_AP_CHECKS_SIGN_STATUS_V
cols (plan_id,COKE_SIGNATURE__A , COKE_SIGNATURE__B, COKE_AUDIT, COKE_CHQ_NUMBER, COKE_PAY_SUPPLIER)
SELECT DISTINCT
COKE_CHQ_NUMBER,
COKE_PAY_SUPPLIER
FROM
apps.Q_COKE_AP_CHECKS_SIGN_STATUS_V
WHERE
plan_id = 40192
AND COKE_SIGNATURE__A = 'YES'
AND COKE_SIGNATURE__B = 'YES'
AND COKE_AUDIT = 'YES'
MINUS
SELECT DISTINCT
COKE_CHQ_NUMBER,
COKE_PAY_SUPPLIER
FROM apps.Q_COKE_AP_CHECKS_SIGN_STATUS_V
INNER JOIN (
SELECT COKE_CHQ_NUMBER_DELIVER
FROM apps.Q_COKE_AP_CHECKS_DELIVERY_ST_V
) T ON T.COKE_CHQ_NUMBER_DELIVER = apps.Q_COKE_AP_CHECKS_SIGN_STATUS_V
WHERE
plan_id = 40192
AND COKE_SIGNATURE__A = 'YES'
AND COKE_SIGNATURE__B = 'YES'
AND COKE_AUDIT = 'YES'

I have code that performs a search on 2 large tables using a wildcard predicate

When searching for an exact value, using the below SQL, the results are returned within 25 seconds. However, when using the LIKE keyword and wildcards, the result is never returned (we have to cancel the query).
The wildcard query replaces the penultimate line with AND A.VENDOR_NO LIKE '%526000802'
I have tried adding an index to the table on just the Vendor_no but no help.
SELECT A.AGY AS AGY, A.VENDOR_NO AS VENDOR_NO,
'****' CONCAT SUBSTR(A.VENDOR_NO,5,6) AS VENDOR_NO_MASKED,
A.VENDOR_NAME AS VENDOR_NAME, A.FY AS FY, A.EFFECTIVE_DATE AS
EFFECTIVE_DATE,
A.BATCH_AGY AS BATCH_AGY, A.BATCH_DATE AS BATCH_DATE,
A.BATCH_TYPE AS BATCH_TYPE,
A.BATCH_NO AS BATCH_NO, A.BATCH_SEQ_NO AS BATCH_SEQ_NO,
A.INVOICE_NO AS INVOICE_NO,
A.INVOICE_DESC AS INVOICE_DESC, A.WARRANT_WRIT_DATE AS WARRANT_WRIT_DATE,
A.WARRANT_NO AS WARRANT_NO, A.ARCHIVE_REF_NO AS ARCHIVE_REF_NO,
A.CUR_DOC_NO AS CUR_DOC_NO, A.CUR_DOC_SFX AS CUR_DOC_SFX,
A.REF_DOC_NO AS REF_DOC_NO,
A.REF_DOC_SFX AS REF_DOC_SFX, B.GLA AS GLA, A.TCODE AS TCODE, A.PCA AS PCA,
A.OBJECT AS OBJECT, A.COBJ AS COBJ, A.AOBJ AS AOBJ, A.INDEX_CODE AS
INDEX_CODE,
A.APPN_NO AS APPN_NO, A.APPD_FUND AS APPD_FUND, A.FUND AS FUND,
B.GL_POST_AMT AS GL_POST_AMT
FROM A60PRD.TB_ADT1_ARCH A LEFT OUTER JOIN A60PRD.TB_ADTG_NEW B
ON A.AGY = B.AGY AND A.BATCH_AGY = B.BATCH_AGY
AND A.BATCH_DATE = B.BATCH_DATE AND A.BATCH_TYPE = B.BATCH_TYPE
AND A.BATCH_NO = B.BATCH_NO AND A.BATCH_SEQ_NO = B.BATCH_SEQ_NO
AND A.TRANS_ID_SFX = B.TRANS_ID_SFX
WHERE A.AGY BETWEEN 'AAA' AND '999'
AND (GLA = '3500' OR GLA = '3501')
AND (CUR_DOC_NO LIKE 'V%' OR CUR_DOC_NO LIKE 'D%')
AND A.VENDOR_NO = '1526000802'
ORDER BY AGY, BATCH_AGY, FY, EFFECTIVE_DATE ;
That LIKE is actually an ends-with check. If your SQL supports the REVERSE(string) function you can maintain a reversed index
AND A.VENDOR_NO_REVERSED LIKE '208000625%'
Additionally add an index on VARCHAR VENDOR_NO_REVERSED.
Alternatively (more a hack) add an indexed field VENDOR_NO_LAST = MOD(VENDOR_NO, 1000) and a condition AND VENDOR_NO_LAST = 802. This scales less well and is slower.
You have exchanged an equality operator with a LIKE expression, which often results in performance drawbacks, if the database is not tuned respectively.
In your case, the % wildcard is at the start of the like-expression. Most likely this renders index usage during tree traversal impossible. Your statement will have to do full table scans, which is bad for runtime performance.
See e.g. https://use-the-index-luke.com/sql/where-clause/searching-for-ranges/like-performance-tuning

Select distinct rows in MySQL

i am experiencing a problem which i can't figure out why is causing me trouble.
I want to select all offers from a database with a DISTINCT statement, so every offer ID is pulled uniquely. Here's what i got in screenshot form on Navicat:
Screenshot http://www.dreshar.com/sql.jpg
http://www.dreshar.com/sql.jpg
For anyone who cant see the image, this is the select - where is my error?
SELECT DISTINCT
guf_offers.id as OfferId,
guf_offers.validfrom AS OfferValidFrom,
guf_offers.validtill AS OfferValidTill,
guf_offers.days AS OfferDays,
guf_offers.active,
guf_offers.roomtype1 AS OfferRoomtype1,
guf_offers.roomprice1 AS OfferRoomprice1,
guf_countries.caption AS CountryCaption,
guf_courts.caption AS CourtCaption,
guf_hotels.caption AS HotelCaption,
guf_hotels.id AS HotelId,
guf_hotel_images.image AS HotelImage,
guf_offer_types.caption AS OffertypeCaption,
guf_regions.caption AS RegionCaption,
guf_offers.hoteloncourse AS OfferHotelOnCourse,
guf_offers.wellnessspa AS OfferWellnessSpa,
guf_offers.18hole AS Offer18Hole,
guf_offers.topangebot AS OfferTopangebot
FROM guf_offers , guf_countries , guf_courts , guf_hotels , guf_hotel_images , guf_offer_types , guf_regions
WHERE guf_offers.country_id = guf_countries.id AND guf_offers.court_id = guf_courts.id AND guf_offers.hotel_id = guf_hotels.id AND guf_hotel_images.hotel_id = guf_offers.hotel_id AND guf_offers.offer_type_id = guf_offer_types.id AND guf_offers.region_id = guf_regions.id AND guf_offers.active = 1 AND STR_TO_DATE(guf_offers.validtill, '%d.%m.%Y') > STR_TO_DATE('29.04.2013', '%d.%m.%Y')
DISTINCT returns any distinct COMBINATION of columns. For example, the first two instances of OfferId = 89 in your image shows a different value in the HotelImage column, so therefore those are DISTINCT results.
If you want distinct ids, then you probably want to use a group by. The query would look something like:
SELECT
guf_offers.id as OfferId,
guf_offers.validfrom AS OfferValidFrom,
guf_offers.validtill AS OfferValidTill,
guf_offers.days AS OfferDays,
guf_offers.active,
guf_offers.roomtype1 AS OfferRoomtype1,
guf_offers.roomprice1 AS OfferRoomprice1,
guf_countries.caption AS CountryCaption,
guf_courts.caption AS CourtCaption,
guf_hotels.caption AS HotelCaption,
guf_hotels.id AS HotelId,
guf_hotel_images.image AS HotelImage,
guf_offer_types.caption AS OffertypeCaption,
guf_regions.caption AS RegionCaption,
guf_offers.hoteloncourse AS OfferHotelOnCourse,
guf_offers.wellnessspa AS OfferWellnessSpa,
guf_offers.18hole AS Offer18Hole,
guf_offers.topangebot AS OfferTopangebot
FROM guf_offers , guf_countries , guf_courts , guf_hotels , guf_hotel_images , guf_offer_types , guf_regions
WHERE guf_offers.country_id = guf_countries.id AND guf_offers.court_id = guf_courts.id AND guf_offers.hotel_id = guf_hotels.id AND guf_hotel_images.hotel_id = guf_offers.hotel_id AND guf_offers.offer_type_id = guf_offer_types.id AND guf_offers.region_id = guf_regions.id AND guf_offers.active = 1 AND STR_TO_DATE(guf_offers.validtill, '%d.%m.%Y') > STR_TO_DATE('29.04.2013', '%d.%m.%Y')
group by guf_offers.id
The other columns are arbitrary values from different rows. This uses an extension to the group by where columns can be in the select even when they are not in an aggregation function or in the group by clause.
By the way, you should learn to do your joins in the from clause using propoer join syntax. It makes queries more readable and maintainable.

MS Access SQL: Troubles combining UNION ALL with a LEFT JOIN

I have created a query in MS Access to simulate a FULL OUTER JOIN and combine the results that looks something like the following:
SELECT NZ(estimates.employee_id, actuals.employee_id) AS employee_id
, NZ(estimates.a_date, actuals.a_date) AS a_date
, estimates.estimated_hours
, actuals.actual_hours
FROM (SELECT *
FROM estimates
LEFT JOIN actuals ON estimates.employee_id = actuals.employee_id
AND estimates.a_date = actuals.a_date
UNION ALL
SELECT *
FROM estimates
RIGHT JOIN actuals ON estimates.employee_id = actuals.employee_id
AND estimates.a_date = actuals.a_date
WHERE estimates.employee_id IS NULL
OR estimates.a_date IS NULL) AS qFullJoinEstimatesActuals
I have saved this query as an object (let's call it qEstimatesAndActuals). My objective is to LEFT JOIN qEstimatesAndActuals with another table. Something like the following:
SELECT *
FROM qJoinedTable
LEFT JOIN (SELECT *
FROM labor_rates) AS rates
ON qJoinedTable.employee_id = rates.employee_id
AND qJoinedTable.a_date BETWEEN rates.begin_date AND rates.end_date
MS Access accepts the syntax and runs the query, but it omits results that are clearly within the result set. Wondering if the date format was somehow lost, I placed a FORMAT around the begin_date and end_date to force them to be interpreted as Short Dates. Oddly, this produced a different result set, but it still omitted result that it shouldn't have.
I am wondering if the queries are performed in such a way that you can't LEFT JOIN the result set of a UNION ALL. Does anyone have any thoughts/ideas on this? Is there a better way of accomplishing the end goal?
I would try breaking each part of the query into its own access query object, e.g.
SELECT *
FROM estimates
LEFT JOIN actuals ON estimates.employee_id = actuals.employee_id
AND estimates.a_date = actuals.a_date
Would be qryOne
SELECT *
FROM estimates
RIGHT JOIN actuals ON estimates.employee_id = actuals.employee_id
AND estimates.a_date = actuals.a_date
WHERE estimates.employee_id IS NULL
OR estimates.a_date IS NULL
Would be qryTwo
SELECT * FROM qryOne
UNION ALL
SELECT * FROM qryTwo
Would be qryFullJoinEstimatesActuals, and finally
SELECT NZ(estimates.employee_id, actuals.employee_id) AS employee_id
, NZ(estimates.a_date, actuals.a_date) AS a_date
, estimates.estimated_hours
, actuals.actual_hours
FROM qryFullJoinEstimatesActuals
I've found that constructs that don't work in complex Access SQL statements often do work properly if they are broken down into individual query objects and reassembled step-by-step. Additionally, you can test each part of the query individually. This will help you find a workaround if one proves to be necessary.
You can find exactly how to do this here.
You're missing an INNER JOIN.... UNION ALL step.
Consistent with the odd behavior surrounding the dates, this issue turned out to be related to the use of NZ to select a date from qFullJoinEstimatesActuals. The use of NZ appears to make the data type ambiguous. As such, the following line from the example in my post caused the error:
, NZ(estimates.a_date, actuals.a_date) AS a_date
The ambiguous data type of a_date caused the BETWEEN operator to produce erroneous results when comparing a_date to rates.begin_date and rates.end_date in the LEFT JOIN. The issue was resolved by type casting the result of the NZ function, as follows:
, CDate(NZ(estimates.a_date, actuals.a_date)) AS a_date