Oracle SQL EXPLAIN PLAN - full table access - sql

I have a query that joined up with 5 tables, it executed with elapse time about 0.2 seconds to retrieve 36 records from my DB. Attached below is the analysis of explain plan, as you can see full table access still occur even those tables already came with indexes.
Anyway if necessary to fine tune the query as below?
SELECT
CASE WHEN DS.NAME = 'InteractiveCustomer' THEN 'NA' ELSE CUS.SOURCE_SYSTEM END AS SOURCE_SYSTEM,
OU.ORGUNIT_CODE AS ORGANIZATION_UNITS,
SUM(
CASE WHEN WS.NAME = 'Pending Autoclosure' THEN 1 ELSE 0 END
) AS PENDING_AUTOCLOSURE,
SUM(
CASE WHEN WS.NAME = 'New' THEN 1 ELSE 0 END
) AS NEW,
SUM(
CASE WHEN WS.NAME = 'Under Investigation' THEN 1 ELSE 0 END
) AS UNDER_INVESTIGATION,
SUM(
CASE WHEN WS.NAME = 'Escalated' THEN 1 ELSE 0 END
) AS ESCALATED,
SUM(
CASE WHEN WS.NAME = 'Recommend True Positive' THEN 1 ELSE 0 END
) AS RECOMMEND_TRUE_POSITIVE,
SUM(
CASE WHEN WS.NAME = 'Reopen Under Investigation' THEN 1 ELSE 0 END
) AS REOPEN_UNDER_INVESTIGATION
FROM
WORKFLOW_STATUSES WS
JOIN WORKFLOW_WORKITEM WW ON WS.ID = WW.STATUS_ID
JOIN WLM_ALERT_HEADER WAH ON WW.ENTITY_KEY = WAH.ALERT_KEY
INNER JOIN ORGANIZATION_UNITS OU ON OU.ID = WAH.CUSTOMER_ORGUNIT_ID
LEFT JOIN CUSTOMERS CUS ON CUS.CUSTOMER_ID = WAH.CUSTOMER_ID
INNER JOIN DATA_SOURCE DS ON WAH.AT_DATASOURCE_ID = DS.ID
WHERE
WW.ENTITY_NAME = 'WLM Alert'
GROUP BY
OU.ORGUNIT_CODE,
CUS.SOURCE_SYSTEM,
DS.NAME;

Full table accesses may still occur on a table with an index, even if a query uses an indexed column simply because the query optimizer may deem if faster to blat the entire table data into memory than bother with the indirection of going to the index, looking for the relevant rows, then picking them off the disk
Full table scanning isn't necessarily a bad thing, though it can be a good place to start if a query is unacceptably long running and the suspicion is due to an FTS on a very large table. On small tables a full scan is pretty insignificant
You asked if it was necessary to fine tune the query - my personal opinion on this is "no, not at this stage" - as per my comment, boost the relevant data tables by a million rows, and run it again to get an idea of how it will scale. It's possible you'll get an entirely different plan. Even if it ends up running for 5 seconds, balance that up against how many times this data will be asked for in prod - if it's every 10 seconds then sure, do something about it. If it's once a month when accounts team sends out the invoices, don't bother with it even if it takes a minute
"Premature optimization is the root of all evil"

Related

Improve CASE WHEN Performance

I want to calculate customer retention week over week. My sales_orders table has columns order_date, and customer_name. Basically I want to check if a customer in this week also had an order the previous week. To do this, I have used CASE WHEN and subquery as follows (I have extracted order_week in a cte I've called weekly_customers and gotten distinct customer names within each week):
SELECT wc.order_week,
wc.customer,
CASE
WHEN wc.customer IN (
SELECT sq.customer
FROM weekly_customers sq
WHERE sq.order_week = (wc.order_week - 1))
THEN 'YES'
ELSE 'NO'
END AS present_in_previous_week
from weekly_customers wc
The query returns the correct data. My issue, the table is really huge with about 15000 distinct weekly values. This obviously leads to very long execution time. Is there a way I can improve this loop or even an alternative to the loop altogether?
Something like this:
SELECT
wc.order_week,
wc.customer,
CASE WHEN wcb.customer IS NOT NULL THEN "YES" ELSE "NO" END AS present_in_previous_week
FROM weekly_customers AS wca
LEFT JOIN
weekly_customers AS wcb
ON
wca.customer = wcb.customer
AND wca.order_week - 1 = wcb.order_week
This joins all of the customer data onto the customer data from a week ago. If there is a record for a week ago then wcb.customer will not be null, and we can set the flag to "YES". Otherwise, we set the flag to "NO".

Using Max(boolean) in a case statement

I am made a temp table of accounts in a database with booleans that provide insight about the accounts. Some customers have multiple accounts so I am grouping them together and was trying to look at the MAX(Boolean) to set a status field.
My query kinda looks like:
with t as (Select lngCustomerNumber,
Case
When 'Criteria for being Active' Then 1
End as blnActive,
Case
When 'Criteria for unexpired' Then 1
End as blnUnexpired
From AccountTable)
Select t.CustomerNumber,
Case
When Max(t.blnActive) = 1
AND Max(t.blnUnexpired) = 1 Then 'Active/Unexpired'
When Max(t.blnActive) = 1
AND Max(t.blnUnexpired) = 0 Then 'Active/Expired'
When Max(t.blnActive) = 0
AND Max(t.blnUnexpired) = 1 Then 'Inactive/Unexpired'
When Max(t.blnActive) = 0
AND Max(t.blnUnexpired) = 0 Then 'Inactive/Expired'
End As strLicenseStatus
From T
Group By t.CustomerNumber
Anything where it checks if the Max(Boolean) = 1 will calc to True correctly, but if I do Max(Boolean) = 0 or Max(Boolean) <> 1 then it does not calc to True when it should.
I have tested by just looking at the grouped Temp Table with each boolean bringing back its Max() value and the ones that should be 0 are coming back as 0.
As a workaround, I have tried
Where t.CustomerNumber NOT IN (SELECT t2.CustomerNumber
FROM t t2
WHERE t2.blnUnexpired = 1
AND t2.CustomerNumber = t.CustomerNumber )
And that does give me the results that I am looking for but I have millions of rows coming back so it has been timing out after many hours, where the previous method was able to run in less than an hour.
I have some other data in my query, the one presented is a much smaller version used to highlight my issue.
Any recommendations on how I can make this work?
Thank you.
When you are defining your blnActive and blnUnexpired cases, you only have the "1" case defined, which means if it doesn't meet these criteria, it will be null. Try adding else 0 to each case:
with t as (Select lngCustomerNumber,
Case
When 'Criteria for being Active' Then 1
Else 0
End as blnActive,
Case
When 'Criteria for unexpired' Then 1
Else 0
End as blnUnexpired
From AccountTable)

Divide by zero error when using >, < or <> in Where statement. No division operator involved

I have two CTEs using the same table which holds receipts. Receipt type "a" says how much is billed and may or may not have an amount received. Receipt type "b"s have how much was received if it was received outside of the original receipt. These are matched up by mnth, cusnbr and job. Also on the receipt is how much is allocated for different expenses on the receipt.
I am trying to total up peoples hours if the receipt has been paid at least 99%. These records are based on cusnbr, jobnbr and mnth also. The code below works fine.
with billed as(Select cusnbr
,job
,mnth
,sum(bill_item_1) as 'Billed Item'
,sum(billed) as 'Billed'
From accounting
Where mytype in ('a','b')
Group by cusnbr
,job
,mnth)
paid as(Select cusnbr
,job
,mnth
,sum(rcpt_item_1) as 'Rcpt Item'
,sum(billed) as 'Paid'
From accounting
Where mytype in ('a','b')
Group by cusnbr
,job
,mnth)
Select b.cusnbr
,b.job
,b.mnth
,sum(g.hours) as 'Total Hours'
,b.[Billed Item]
,p.[Rcpt Item]
From billed b inner join paid p
on b.cusnbr = p.cusnbr
and b.job = p.job
and b.mnth = p.mnth
inner join guys g
on b.cusnbr = g.cusnbr
and b.job = g.job
and b.mnth = g.mnth
Where p.[Paid]/b.Billed > .99
The issue I'm having is if I try to add
and b.[Billed Item] <> 0
To the where clause.
I get "Divide by zero error encountered"
I have tried making the last query a CTE with
case when b.[Billed Item] = 0 then 1 else 0 end as flag
and then making another query which checks that flag <> 0
I have tried using isnull(b.[Billed Item],0) in the last query as well as isnull(bill_item_1,0) in the first CTE.
I can get around this issue by dumping the whole thing into a temp table and querying that, but just want to know why this is happening. Using ">","<" or "<>" against b.[Billed Item] results in a divide by zero error.
Use nullif():
Where p.[Paid] / nullif(b.Billed, 0) > 0.99
This will return null -- which does not meet the condition. You can also phrase this more simply without division as:
where p.paid > b.billed * 0.99
I can't answer you question specifically, but I can tell you that SQL does not process all commands if it does not need to. For example,
SELECT COUNT(1/0)
Happily returns 1. So it is quite possible the order of the conditions cause the optimizer to filter out an uncessary division by zero condition.

Query taking too long - Optimization

I am having an issue with the following query returning results a bit too slow and I suspect I am missing something basic. My initial guess is the 'CASE' statement is taking too long to process its result on the underlying data. But it could be something in the derived tables as well.
The question is, how can I speed this up? Are there any glaring errors in the way I am pulling the data? Am I running into a sorting or looping issues somewhere? The query runs for about 40 seconds, which seems quite long. C# is my primary expertise, SQL is a work in progress.
Note I am not asking "write my code" or "fix my code". Just for a pointer in the right direction, I can't seem to figure out where the slow down occurs. Each derived table runs very quickly (less than a second) by themselves, the joins seem correct and the result set is returning exactly what I need. It's just too slow and I'm sure there are better SQL scripter's out there ;) Any tips would be greatly appreciated!
SELECT
hdr.taker
, hdr.order_no
, hdr.po_no as display_po
, cust.customer_name
, hdr.customer_id
, 'INCORRECT-LARGE ORDER' + CASE
WHEN (ext_price_calc >= 600.01 and ext_price_calc <= 800) and fee_price.unit_price <> round(ext_price_calc * -.01,2)
THEN '-1%: $' + cast(cast(ext_price_calc * -.01 as decimal(18,2)) as varchar(255))
WHEN ext_price_calc >= 800.01 and ext_price_calc <= 1000 and fee_price.unit_price <> round(ext_price_calc * -.02,2)
THEN '-2%: $' + cast(cast(ext_price_calc * -.02 as decimal(18,2)) as varchar(255))
WHEN ext_price_calc > 1000 and fee_price.unit_price <> round(ext_price_calc * -.03,2)
THEN '-3%: $' + cast(cast(ext_price_calc * -.03 as decimal(18,2)) as varchar(255))
ELSE
'OK'
END AS Status
FROM
(myDb_view_oe_hdr hdr
LEFT OUTER JOIN myDb_view_customer cust
ON hdr.customer_id = cust.customer_id)
LEFT OUTER JOIN wpd_view_sales_territory_by_customer territory
ON cust.customer_id = territory.customer_id
LEFT OUTER JOIN
(select
order_no,
SUM(ext_price_calc) as ext_price_calc
from
(select
hdr.order_no,
line.item_id,
(line.qty_ordered - isnull(qty_canceled,0)) * unit_price as ext_price_calc
from myDb_view_oe_hdr hdr
left outer join myDb_view_oe_line line
on hdr.order_no = line.order_no
where
line.delete_flag = 'N'
AND line.cancel_flag = 'N'
AND hdr.projected_order = 'N'
AND hdr.delete_flag = 'N'
AND hdr.cancel_flag = 'N'
AND line.item_id not in ('LARGE-ORDER-1%','LARGE-ORDER-2%', 'LARGE-ORDER-3%', 'FUEL','NET-FUEL', 'CONVENIENCE-FEE')) as line
group by order_no) as order_total
on hdr.order_no = order_total.order_no
LEFT OUTER JOIN
(select
order_no,
count(order_no) as convenience_count
from oe_line with (nolock)
left outer join inv_mast inv with (nolock)
on oe_line.inv_mast_uid = inv.inv_mast_uid
where inv.item_id in ('LARGE-ORDER-1%','LARGE-ORDER-2%', 'LARGE-ORDER-3%')
and oe_line.delete_flag <> 'Y'
group by order_no) as fee_count
on hdr.order_no = fee_count.order_no
INNER JOIN
(select
order_no,
unit_price
from oe_line line with (nolock)
where line.inv_mast_uid in (select inv_mast_uid from inv_mast with (nolock) where item_id in ('LARGE-ORDER-1%','LARGE-ORDER-2%', 'LARGE-ORDER-3%'))) as fee_price
ON fee_count.order_no = fee_price.order_no
WHERE
hdr.projected_order = 'N'
AND hdr.cancel_flag = 'N'
AND hdr.delete_flag = 'N'
AND hdr.completed = 'N'
AND territory.territory_id = ‘CUSTOMERTERRITORY’
AND ext_price_calc > 600.00
AND hdr.carrier_id <> '100004'
AND fee_count.convenience_count is not null
AND CASE
WHEN (ext_price_calc >= 600.01 and ext_price_calc <= 800) and fee_price.unit_price <> round(ext_price_calc * -.01,2)
THEN '-1%: $' + cast(cast(ext_price_calc * -.01 as decimal(18,2)) as varchar(255))
WHEN ext_price_calc >= 800.01 and ext_price_calc <= 1000 and fee_price.unit_price <> round(ext_price_calc * -.02,2)
THEN '-2%: $' + cast(cast(ext_price_calc * -.02 as decimal(18,2)) as varchar(255))
WHEN ext_price_calc > 1000 and fee_price.unit_price <> round(ext_price_calc * -.03,2)
THEN '-3%: $' + cast(cast(ext_price_calc * -.03 as decimal(18,2)) as varchar(255))
ELSE
'OK' END <> 'OK'
Just as a clue to the right direction for optimization:
When you do an OUTER JOIN to a query with calculated columns, you are guaranteeing not only a full table scan, but that those calculations must be performed against every row in the joined table. It appears that you can actually do your join to oe_line without the column calculations (i.e. by filtering ext_price_calc to a specific range).
You don't need to do most of the subqueries that are in your query--the master query can be recrafted to use regular table join syntax. Joins to subqueries containing subqueries presents a challenge to the SQL optimizer that it may not be able to meet. But by using regular joins, the optimizer has a much better chance at identifying more efficient query strategies.
You don't tag which SQL engine you're using. Every database has proprietary extensions that may allow for speedier or more efficient queries. It would be easier to provide useful feedback if you indicated whether you were using MySQL, SQL Server, Oracle, etc.
Regardless of the database you're using, reviewing the query plan is always a good place to start. This will tell you where most of the I/O and time in your query is being spent.
Just on general principle, make sure your statistics are up-to-date.
It's may not be solvable by any of us without the real stuff to test with.
IF that's the case and nobody else posts the answer, I can still help. Here is how to trouble shoot it.
(1) take joins and pieces out one by one.
(2) this will cause errors. Remove or fake the references to get rid of them.
(3) see how that works.
(4) Put items back before you try taking something else out
(5) keep track...
(6) also be aware where a removal of something might drastically reduce the result set.
You might find you're missing an index or some other smoking gun.
I was having the same problem and I was able to solve it by indexing one of the tables and setting a primary key.
I strongly suspect that the problem lies in the number of joins you're doing. A lot of databases do joins basically by systemically checking all possible combinations of the various tables as being valid - so if you're joinging table A and B on column C, and A looks like:
Name:C
Fred:1
Alice:2
Betty:3
While B looks like:
C:Pet
1:Alligator
2:Lion
3:T-Rex
When you do the join, it checks all 9 possibilities:
Fred:1:1:Alligator
Fred:1:2:Lion
Fred:1:3:T-Rex
Alice:2:1:Alligator
Alice:2:2:Lion
Alice:2:3:T-Rex
Betty:3:1:Alligator
Betty:3:2:Lion
Betty:3:3:T-Rex
And goes through and deletes the non-matching ones:
Fred:1:1:Alligator
Alice:2:2:Lion
Betty:3:3:T-Rex
... which means with three entries in each table, it creates nine temporary records, sorts through them all, and deletes six of them ... all before it actually sorts through the results for what you're after (so if you are looking for Betty's Pet, you only want one row on that final result).
... and you're doing how many joins and sub-queries?

How would I write this SQL query?

I have the following tables:
PERSON_T DISEASE_T DRUG_T
========= ========== ========
PERSON_ID DISEASE_ID DRUG_ID
GENDER PERSON_ID PERSON_ID
NAME DISEASE_START_DATE DRUG_START_DATE
DISEASE_END_DATE DRUG_END_DATE
I want to write a query that takes an input of a disease id and returns one row for each person in the database with a column for the gender, a column for whether or not they have ever had the disease, and a column for each drug which specifies if they took the drug before contracting the disease. I.E. true would mean drug_start_date < disease_start_date. False would mean drug_start_date>disease_start_date or the person never took that particular drug.
We currently pull all of the data from the database and use Java to create a 2D array with all of these values. We are investigating moving this logic into the database. Is it possible to create a query that will return the result set as I want it or would I have to create a stored procedure? We are using Postgres, but I assume an SQL answer for another database will easily translate to Postgres.
Based on the info provided:
SELECT p.name,
p.gender,
CASE WHEN d.disease_id IS NULL THEN 'N' ELSE 'Y' END AS had_disease,
dt.drug_id
FROM PERSON p
LEFT JOIN DISEASE d ON d.person_id = p.person_id
AND d.disease_id = ?
LEFT JOIN DRUG_T dt ON dt.person_id = p.person_id
AND dt.drug_start_date < d.disease_start_date
..but there's going to be a lot of rows that will look duplicate except for the drug_id column.
You're essentially looking to create a cross-tab query with the drugs. While there are plenty of OLAP tools out there that can do this sort of thing (among all sorts of other slicing and dicing of the data), doing something like this in traditional SQL is not easy (and, in general, impossible to do without some sort of procedural syntax in all but the simplest scenarios).
You essentially have two options when doing this with SQL (well, more accurately, you have one option, and another more complicated but flexible option that derives from it):
Use a series of CASE statements in your query to produce columns that are representative of each individual drug. This requires knowing the list of variable values (i.e. drugs) ahead of time
Use a procedural SQL language, such as T-SQL, to dynamically construct a query that uses case statements as described above, but along with obtaining that list of values from the data itself.
The two options essentially do the same thing, you're just trading simplicity and ease of maintenance for flexibility in the second option.
For example, using option 1:
select
p.NAME,
p.GENDER,
(case when d.DISEASE_ID is null then 0 else 1 end) as HAD_DISEASE,
(case when sum(case when dr.DRUG_ID = 1 then 1 else 0 end) > 0 then 1 else 0 end) as TOOK_DRUG_1,
(case when sum(case when dr.DRUG_ID = 2 then 1 else 0 end) > 0 then 1 else 0 end) as TOOK_DRUG_2,
(case when sum(case when dr.DRUG_ID = 3 then 1 else 0 end) > 0 then 1 else 0 end) as TOOK_DRUG_3
from PERSON_T p
left join DISEASE_T d on d.PERSON_ID = p.PERSON_ID and d.DISEASE_ID = #DiseaseId
left join DRUG_T dr on dr.PERSON_ID = p.PERSON_ID and dr.DRUG_START_DATE < d.DISEASE_START_DATE
group by p.PERSON_ID, p.NAME, p.GENDER, d.DISEASE_ID
As you can tell, this gets a little laborious as you get outside of just a few potential values.
The other option is to construct this query dynamically. I don't know PostgreSQL and what, if any, procedural capabilities it has, but the overall procedure would be this:
Gather list of potential DRUG_ID values along with names for the columns
Prepare three string values: the SQL prefix (everything before the first drug-related CASE statement, the SQL stuffix (everything after the last drug-related CASE statement), and the dynamic portion
Construct the dynamic portion by combining drug CASE statements based upon the previously retrieved list
Combine them into a single (hopefully valid) SQL statement and execute