PL/SQL Query Optimization - sql

I have this query that executes 100,000s of times. It currently runs pretty fast. I was just wondering if there was a better way to run it for even faster response times.
CODES TABLE = 160KB
INDEXES: INSTANCE(UNIQUE) SHORT)DESC
CODE_VALUES=10MB
INDEXES: INSTANCE(UNIQUE), INTFC_INST, CODE_INST,SHORT_DESC
INTERFACES=160KB
INDEXES: INSTANCE (UNIQUE), SHORT_DESC
id="0" operation="SELECT STATEMENT" optimizer="ALL_ROWS" search_columns="0" cost="7">
id="1" operation="NESTED LOOPS" search_columns="0" cost="7" cardinality="1" bytes="102" cpu_cost="54,820" io_cost="7" qblock_name="SEL$1" time="1">
id="2" operation="MERGE JOIN" option="CARTESIAN" search_columns="0" cost="3" cardinality="1" bytes="33" cpu_cost="23,764" io_cost="3" time="1">
object_ID="0" id="3" operation="TABLE ACCESS" option="BY INDEX ROWID" object_name="CODES" object_type="TABLE" search_columns="0" cost="2" cardinality="1" bytes="19" cpu_cost="15,443" io_cost="2" qblock_name="SEL$1" time="1">
object_ID="1" id="4" operation="INDEX" option="RANGE SCAN" object_name="CODES_SHORT_DESC_FINDX" object_type="INDEX" search_columns="1" cost="1" cardinality="1" cpu_cost="8,171" io_cost="1" qblock_name="SEL$1" access_predicates=""A"."SYS_NC00010$"='MANAGER_GROUP'" time="1"/>
id="5" operation="BUFFER" option="SORT" search_columns="0" cost="1" cardinality="1" bytes="14" cpu_cost="8,321" io_cost="1" time="1">
object_ID="2" id="6" operation="TABLE ACCESS" option="BY INDEX ROWID" object_name="INTERFACES" object_type="TABLE" search_columns="0" cost="1" cardinality="1" bytes="14" cpu_cost="8,321" io_cost="1" qblock_name="SEL$1" time="1">
object_ID="3" id="7" operation="INDEX" option="RANGE SCAN" object_name="INTERFACES_SHORT_DESC_FINDX" object_type="INDEX" search_columns="1" cost="0" cardinality="1" cpu_cost="1,050" io_cost="0" qblock_name="SEL$1" access_predicates=""C"."SYS_NC00007>
object_ID="4" id="8" operation="TABLE ACCESS" option="BY INDEX ROWID" object_name="CODE_VALUES" object_type="TABLE" search_columns="0" cost="4" cardinality="1" bytes="69" cpu_cost="31,056" io_cost="4" qblock_name="SEL$1" filter_predicates="("A"."INSTANCE"="B"."CODE_INST" AND "B"."INTFC_INST"="C"."INSTANCE")" time="1">
object_ID="5" id="9" operation="INDEX" option="RANGE SCAN" object_name="CODE_VALUES_FUN_IDX" object_type="INDEX" search_columns="1" cost="1" cardinality="4" cpu_cost="8,771" io_cost="1" qblock_name="SEL$1" access_predicates=""B"."SYS_NC00010$"='150'" time="1"/>
SELECT A.INSTANCE, C.INSTANCE, B.LONG_DESC
FROM CODES A,
CODE_VALUES B,
INTERFACES C
WHERE A.INSTANCE = B.CODE_INST
AND B.INTFC_INST = C.INSTANCE
AND TRIM (A.SHORT_DESC) = TRIM (var1)
AND TRIM (B.SHORT_DESC) = TRIM (var2)
AND TRIM (C.SHORT_DESC) = TRIM (var3)

Avoid TRIM functions in WHERE and JOIN clauses - > TRIM (A.SHORT_DESC) = TRIM (var1)
Just creating indexes on JOIN, WHERE and GROUP clause columns doesn’t mean that your query will always return your required results quickly.
It is query optimizer which selects proper index for a query to give you an optimum performance but query optimizer can only suggest optimum query plan by using proper indexes WHEN your are helping it by writing good query syntax.
Using any type of function (system or user defined) in WHERE or JOIN clause can dramatically decrease query performance because this practice create hurdles in query optimizer work of proper index selection.
One common example is TRIM functions, which are commonly used by developers in WHERE clause.
USE AdventureWorks
GO
SELECT pr.ProductID,pr.Name,pr.ProductNumber,wo.* fROM Production.WorkOrder wo
INNER JOIN Production.Product pr
ON PR.ProductID = wo.ProductID
WHERE LTRIM(RTRIM(pr.name)) = 'HL Mountain Handlebars'
GO
SELECT pr.ProductID,pr.Name,pr.ProductNumber,wo.* fROM Production.WorkOrder wo
INNER JOIN Production.Product pr
ON PR.ProductID = wo.ProductID
WHERE pr.name = 'HL Mountain Handlebars'
Though outputs of both queries are same but first query took almost 99% of total execution time. This huge difference is just because of these trim functions so on production databases we must avoid these TRIM and other functions in both JOIN and WHERE clauses.
Taken from AASIM ABDULLAH blog
So what you could/should do is run an update on your data to trim it once and for all, and start triming it while its being added to the table, so no new data will ever require trimming.
Or if that for some reason is not possible, look for Function-Based indexes as suggested by Maurice Reeves in comments.

Related

How to extract a list of values from a SQL XML column

Here is my XML:
<Triggers>
<Trigger>
<Name>DrugName</Name>
<Values>
<Value>Meclofenamate</Value>
<Value>Meloxicam</Value>
<Value>Vimovo</Value>
<Value>Nabumetone</Value>
<Value>Qmiiz</Value>
<Value>Tolmetin</Value>
</Values>
</Trigger>
<Trigger>
<Name>State</Name>
<Values>
<Value>MI</Value>
</Values>
</Trigger>
<Trigger>
<Name>BenefitType</Name>
<Values>
<Value>Pharmacy</Value>
</Values>
</Trigger>
<Trigger>
<Name>LineOfBusiness</Name>
<Values>
<Value>Medicaid</Value>
</Values>
</Trigger>
</Triggers>
My goal is to get output that looks like this:
ID DrugName State BenefitType LineOfBusiness
6500 Meclofenamate MI Pharmacy Medicaid
6501 Meloxicam MI Pharmacy Medicaid
6502 Vimovo MI Pharmacy Medicaid
6503 Nabumetone MI Pharmacy Medicaid
6504 Qmiiz MI Pharmacy Medicaid
6505 Tolmetin MI Pharmacy Medicaid
I can't find any examples on stackoverflow after extensive searches where XML is organized this way, and the examples I have found, tweaked and applied result in my getting a list of all the Values in one column (State values, BenefitType values, etc. mixed in with DrugName values).
The ID column is not part of the XML, but I need to have that in my output.
Here what the table looks like that has the XML column.
You needs the .nodes XML function to break out the Trigger nodes, then again for Values rows.
To get the value of a node instead of it's name, we use text().
To verify we are grabbing the right Trigger node for each column, we use the [] predicate to check (a bit like a where).
.value requires a single value, so we use [1] to get the first node.
SELECT
DrugName = drugs.DrugName.value('text()[1]','nvarchar(100)'),
State = tr.Trigg.value('Trigger[Name/text()="State"][1]/Values[1]/Value[1]/text()[1]', 'nvarchar(100)'),
BenefitType = tr.Trigg.value('Trigger[Name/text()="BenefitType"][1]/Values[1]/Value[1]/text()[1]', 'nvarchar(100)'),
LineOfBusiness = tr.Trigg.value('Trigger[Name/text()="LineOfBusiness"][1]/Values[1]/Value[1]/text()[1]', 'nvarchar(100)')
FROM #xml.nodes('/Triggers') tr(Trigg)
OUTER APPLY tr.Trigg.nodes('Trigger[Name/text()="DrugName"][1]/Values/Value') drugs(DrugName)

Left outer joins in FetchXML with multiple conditions

I'm trying to do a left outer join in FetchXML with multiple conditions.
Here is the approximate SQL of what I'm trying to achieve:
SELECT incident.*, externalCheck.*
FROM incident
LEFT OUTER JOIN externalCheck
ON externalCheck.incidentId = incident.incidentId
AND externalCheck.checkType = 1
AND externalCheck.isLatest = 1;
NB: This should always return a 1:1 relationship since our business logic requires that there is only one isLatest for each checkType.
And here is my FetchXML:
<entity name="incident">
<all-attributes />
<link-entity name="externalCheck" from="incidentId" to="incidentId" link-type="outer">
<filter type="and">
<condition attribute="checkType" operator="eq" value="1" />
<condition attribute="isLatest" operator="eq" value="1" />
</filter>
<all-attributes />
</link-entity>
</entity>
The Problem
The problem is that incident records where the right-hand side of the join are null (i.e. there is no externalCheck record) are not being returned, whereas in a left outer join I would expect that the incident record is still returned even if the right-hand side of the join is null.
What I suspect is that FetchXML is converting my filter to a WHERE clause, rather than adding the conditions to the ON clause.
The Question
Can anyone confirm what is happening, and a possible solution?
Your suspicion is correct. But you can overcome from it somewhat.
Fetchxml is flexible & the below snippet will give the results for left outer join with multiple clause.
<entity name="incident">
<all-attributes />
<link-entity name="externalCheck" alias="ext" from="incidentId" to="incidentId" link-type="outer">
<filter type="and">
<condition attribute="checkType" operator="eq" value="1" />
<condition attribute="isLatest" operator="eq" value="1" />
</filter>
</link-entity>
<filter>
<condition entityname="ext" attribute="externalCheckid" operator= "null" />
</filter>
</entity>
The real problem is with externalCheck.*, you cannot get the related entity attributes. Have to remove the <all-attributes /> from link-entity node.
Read more

SQL JOIN query Optimization with subqueries

I used below subquery for get attached records. i need to know is it Optimized query for my task (seems within three month its exists records more than 10,000 ).then is it support for that data load.?
can i use JOIN keyword instead of this below method.please advice me to sort this out.
currently i'm using postgresql as my backend.
select worker,worktype,paymenttype,sum(output)as totalkgs_ltrs,sum(overkgs)as overkgs_ltrs,sum(workedhrs) as workedhrs,sum(scrap) as scrap,sum(cashworkincome) as cashworkincome,sum(pss) as pss
from (select
comp.name as company,
est.name as estate,
div.name as division,
wkr.name as worker,
txn.date as updateddate,
txn.type as worktype,
txn.payment_type as paymenttype,
txn.names as workedhrs,
txn.norm as norm,
txn.output as output,
txn.over_kgs as overkgs,
txn.scrap as scrap,
txn.cash_work_income as cashworkincome,
txn.pss as pss
from
bpl_daily_transaction_master txn,
res_company comp,
bpl_division_n_registration div,
bpl_estate_n_registration est,
bpl_worker wkr
where
comp.id = txn.bpl_company_id and
div.id = txn.bpl_division_id and
est.id = txn.bpl_estate_id and
wkr.id = txn.worker_id
)as subq
group by worker,worktype,paymenttype
here shows my result when i execute this query
here is the subquery's code & results tagged at bottom section
select
comp.name as company,
est.name as estate,
div.name as division,
wkr.name as worker,
txn.date as updateddate,
txn.type as worktype,
txn.payment_type as paymenttype,
txn.names as workedhrs,
txn.norm as norm,
txn.output as output,
txn.over_kgs as overkgs,
txn.scrap as scrap,
txn.cash_work_income as cashworkincome,
txn.pss as pss
from
bpl_daily_transaction_master txn,
res_company comp,
bpl_division_n_registration div,
bpl_estate_n_registration est,
bpl_worker wkr
where
comp.id = txn.bpl_company_id and
div.id = txn.bpl_division_id and
est.id = txn.bpl_estate_id and
wkr.id = txn.worker_id
this is above main query result and its shows all records
select wkr.name as worker,txn.type as worktype,txn.payment_type as paymenttype,sum(txn.output)as totalkgs_ltrs,sum(txn.over_kgs)as overkgs_ltrs,
sum(txn.names) as workedhrs,sum(txn.scrap) as scrap,sum(txn.cash_work_income) as cashworkincome,sum(txn.pss) as pss
from
bpl_daily_transaction_master txn
inner join res_company comp
on comp.id = txn.bpl_company_id
inner join bpl_division_n_registration div
on div.id = txn.bpl_division_id
inner join bpl_estate_n_registration est
on est.id = txn.bpl_estate_id
inner join bpl_worker wkr
on wkr.id = txn.worker_id
group by wkr.name,txn.type,txn.payment_type
What you are are doing in your subquery is old ANSI SQL -89 syntax for joining tables which is not recommended.
But as far as performance is concerned I don't think there is difference as confirmed on this stackoverflow thread.
According to "SQL Performance Tuning" by Peter Gulutzan and Trudy
Pelzer, of the six or eight RDBMS brands they tested, there was no
difference in optimization or performance of SQL-89 versus SQL-92
style joins. One can assume that most RDBMS engines transform the
syntax into an internal representation before optimizing or executing
the query, so the human-readable syntax makes no difference.

SQL Inner Join. ON condition vs WHERE clause

I am busy converting a query using the old style syntax to the new join syntax. The essence of my query is as follows :
Original Query
SELECT i.*
FROM
InterestRunDailySum i,
InterestRunDetail ird,
InterestPayments p
WHERE
p.IntrPayCode = 187
AND i.IntRunCode = p.IntRunCode AND i.ClientCode = p.ClientCode
AND ird.IntRunCode = p.IntRunCode AND ird.ClientCode = p.ClientCode
New Query
SELECT i.*
FROM InterestPayments p
INNER JOIN InterestRunDailySum i
ON (i.IntRunCode = p.IntRunCode AND i.ClientCode = p.ClientCode)
INNER JOIN InterestRunDetail ird
ON (ird.IntRunCode = p.IntRunCode AND ird.IntRunCode = p.IntRunCode)
WHERE
p.IntrPayCode = 187
In this example, "Original Query" returns 46 rows, where "New Query" returns over 800
Can someone explain the difference to me? I would have assumed that these queries are identical.
The problem is with your join to InterestRunDetail. You are joining on IntRunCode twice.
The correct query should be:
SELECT i.*
FROM InterestPayments p
INNER JOIN InterestRunDailySum i
ON (i.IntRunCode = p.IntRunCode AND i.ClientCode = p.ClientCode)
INNER JOIN InterestRunDetail ird
ON (ird.IntRunCode = p.IntRunCode AND ird.ClientCode = p.ClientCode)
WHERE
p.IntrPayCode = 187
The "new query" is the one compatible with the current ANSI SQL standard for JOINs.
Also, I find query #2 much cleaner:
you are almost forced to think about and specify the join condition(s) between two tables - you will not accidentally have cartesian products in your query. If you happen to list ten tables, but only six join conditions in your WHERE clause - you'll get a lot more data back than expected!
your WHERE clause isn't cluttered with join conditions and thus it's cleaner, less messy, easier to read and understand
the type of your JOIN (whether INNER JOIN, LEFT OUTER JOIN, CROSS JOIN) is typically a lot easier to see - since you spell it out. With the "old-style" syntax, the difference between those join types is rather hard to see, buried somewhere in your lots of WHERE criteria.....
Functionally, the two are identical - #1 might be deprecated sooner or later by some query engines.
Also see Aaron Bertrand's excellent Bad Habits to Kick - using old-style JOIN syntax blog post for more info - and while you're at it - read all "bad habits to kick" posts - all very much worth it!

Why does this SQL result in Index Scan instead of an Index Seek?

Can someone please help me tune this SQL query?
SELECT a.BuildingID, a.ApplicantID, a.ACH, a.Address, a.Age, a.AgentID, a.AmenityFee, a.ApartmentID, a.Applied, a.AptStatus, a.BikeLocation, a.BikeRent, a.Children,
a.CurrentResidence, a.Email, a.Employer, a.FamilyStatus, a.HCMembers, a.HCPayment, a.Income, a.Industry, a.Name, a.OccupancyTimeframe, a.OnSiteID,
a.Other, a.ParkingFee, a.Pets, a.PetFee, a.Phone, a.Source, a.StorageLocation, a.StorageRent, a.TenantSigned, a.WasherDryer, a.WasherRent, a.WorkLocation,
a.WorkPhone, a.CreationDate, a.CreatedBy, a.LastUpdated, a.UpdatedBy
FROM dbo.NPapplicants AS a INNER JOIN
dbo.NPapartments AS apt ON a.BuildingID = apt.BuildingID AND a.ApartmentID = apt.ApartmentID
WHERE (apt.Offline = 0)
AND (apt.MA = 'M')
.
Here's what the Execution Plan looks like:
.
What I don't understand is why I'm getting a Index Scan for NPapplicants. I have an Index that covers BuildingID and ApartmentID. Shouldn't that be used?
It is because it is expecting close to 10K records to return from the matches. To go back to the data to retrieve other columns using 10K keys is equivalent to something like the performance of just scanning 100K records (at the very least) and filtering using hash match.
As for access to the other table, the Query Optimizer has decided that your index is useful (probably against Offline or MA) so it is seeking on that index to get the join keys.
These two are then HASH matched for intersections to produce the final output.
A seek in a B-Tree index is several times as expensive as a table scan (per record).
Additionally, another seek in the clustered index should be made to retrieve the values of other columns.
If a large portion of records is expected to match, then it is cheaper to scan the clustered index.
To make sure that the optimizer had chosen the best method, you may run this:
SET STATISTICS IO ON
SET STATSTICS TIME ON
SELECT a.BuildingID, a.ApplicantID, a.ACH, a.Address, a.Age, a.AgentID, a.AmenityFee, a.ApartmentID, a.Applied, a.AptStatus, a.BikeLocation, a.BikeRent, a.Children,
a.CurrentResidence, a.Email, a.Employer, a.FamilyStatus, a.HCMembers, a.HCPayment, a.Income, a.Industry, a.Name, a.OccupancyTimeframe, a.OnSiteID,
a.Other, a.ParkingFee, a.Pets, a.PetFee, a.Phone, a.Source, a.StorageLocation, a.StorageRent, a.TenantSigned, a.WasherDryer, a.WasherRent, a.WorkLocation,
a.WorkPhone, a.CreationDate, a.CreatedBy, a.LastUpdated, a.UpdatedBy
FROM dbo.NPapplicants AS a INNER JOIN
dbo.NPapartments AS apt ON a.BuildingID = apt.BuildingID AND a.ApartmentID = apt.ApartmentID
WHERE (apt.Offline = 0)
AND (apt.MA = 'M')
SELECT a.BuildingID, a.ApplicantID, a.ACH, a.Address, a.Age, a.AgentID, a.AmenityFee, a.ApartmentID, a.Applied, a.AptStatus, a.BikeLocation, a.BikeRent, a.Children,
a.CurrentResidence, a.Email, a.Employer, a.FamilyStatus, a.HCMembers, a.HCPayment, a.Income, a.Industry, a.Name, a.OccupancyTimeframe, a.OnSiteID,
a.Other, a.ParkingFee, a.Pets, a.PetFee, a.Phone, a.Source, a.StorageLocation, a.StorageRent, a.TenantSigned, a.WasherDryer, a.WasherRent, a.WorkLocation,
a.WorkPhone, a.CreationDate, a.CreatedBy, a.LastUpdated, a.UpdatedBy
FROM dbo.NPapplicants WITH (INDEX (index_name)) AS a
INNER JOIN
dbo.NPapartments AS apt ON a.BuildingID = apt.BuildingID AND a.ApartmentID = apt.ApartmentID
WHERE (apt.Offline = 0)
AND (apt.MA = 'M')
Replace index_name with the actual name of your index and compare the execution times and the numbers of I/O operations (as seen in the messages tab)