I have this simple query but it is taking 1 min for just 0.5M records even all columns mentioned in select are in Non Clustered index.
Both tables have approx 1M records and approx 200 columns in each.
Does having lots of records in table or having lot of index causing the slowness.
SELECT catalog_items.id,
catalog_items.store_code,
catalog_items.web_code AS web_code,
catalog_items.name AS name,
catalog_items.name AS item_description,
catalog_items.image_thumnail AS image_thumnail,
catalog_items.purchase_description AS purchase_description,
catalog_items.sale_description AS sale_description,
catalog_items.taxable,
catalog_items.is_unique_item,
ISNULL(catalog_items.inventory_posting_flag, 'Y') AS inventory_posting_flag,
catalog_item_extensions.total_cost,
catalog_item_extensions.price
FROM catalog_items
LEFT OUTER JOIN catalog_item_extensions ON catalog_items.id = catalog_item_extensions.catalog_item_id
WHERE catalog_items.trans_flag = 'A';
Update: execution plan showing index missing it the same index is already there. Why?
I'm not convinced that the plan is wrong currently, on the basis that you mention selecting 500k rows, out of a table of 1m rows. Even with an index as suggested by others, the selectivity of that index is pretty weak, from a tipping point perspective (https://www.sqlskills.com/blogs/kimberly/the-tipping-point-query-answers/ ) - even with 200 columns I wouldn't expect 500k out of 1m rows per table to result in index seeks with lookups, a full scan would be faster in the CBO's view.
The missing index question - if you look closely its not just suggesting the index on trans_flag, it's suggesting to index the field and then INCUDE a number more. We can't see how many it's suggesting to include, but I would expect it to be all of them in the query and it's basically suggesting you create a covering index. Even in an NC Index Scan scenario this would be faster to scan than the base table.
We also have no information about physical layouts as yet, how the page is constructed, level of fragmentation etc, or even what kind of disks the data is on and overall size. That image_thumbnail field for example is suggestive of a large row size overall, which means we are dealing with off page storage into LOB / SLOB.
In short - even with a query plan, there is no 'easy' answer here in my view.
For this query
select . . .
from catalog_items ci left outer join
catalog_item_extensions cie
on ci.id = cie.catalog_item_id
where ci.trans_flag = 'A'
I would recommend an index on catalog_items(trans_flag, id) and catalog_item_extensions(catalog_item_id).
Related
I have a table in an Azure SQL database with ~2 million rows. On this table I have two pairs of columns that I want to filter for null - so I have filtered indexes on each pairs, checking for null on each column.
One of the indexes appears to be about twice the size of the other - (~400,000 vs ~800,000 rows).
However it seems to take about 10x as long to query.
I'm running the exact same query on both:
SELECT DomainObjectId
FROM DomainObjects
INNER JOIN
(SELECT <id1> AS Id
UNION ALL
SELECT<id2> etc...
) AS DomainObjectIds ON DomainObjectIds = DonorIds.Id
WHERE C_1 IS NULL
AND C_2 IS NULL
Where C_1/C_2 are the columns with the filtered index (and get replaced with other columns in my other query).
The query plans both involve an Index Seek - but in the fast one, it's 99% of the runtime. In the slow case, it's a mere 50% - with the other half spent filtering (which seems suspect, given the filtering should be implicit from the filtered index), and then joining to the queried IDs.
In addition, the "estimated number of rows to be read" for the index seek is ~2 million, i.e. the size of the full table. Based on that, it looks like what I'd expect a full table scan to look like. I've checked the index sizes and the "slow" one only takes up twice the space of the "fast" one, which implies it's not just been badly generated.
I don't understand:
(a) Why so much time is spent re-applying the filter from the index
(b) Why the estimated row count is so high for the "slow" index
(c) What could cause these two queries to be so different in speed, given the similarities in the filtered indexes. I would expect the slow one to be twice as slow, based purely on the number of rows matching each filter.
I have a table in which I have a lot of indexes. I noticed that in on one of them "DISTINCT_KEYS" is almost the same as "NUM_ROWS". Is such an index needed?
Or maybe it is better to remove it because:
takes a place on the database.
When adding data to a table, it does not necessarily slow down the refreshing of indexes.
What do you think? Will deleting this index slow down the queries using the name of this column?
Is such an index needed?
All you can tell from statistics like DISTINCT_KEYS and NUM_ROWS (and other statistics like histograms) is whether an index might be useful. An index is only truly "needed" if it is actually being used by queries in your system. (See ALTER INDEX ... MONITORING USAGE command)
An index having DISTINCT_KEYS that is almost equal to NUM_ROWS certainly might be useful. In fact, it would be much more natural to suspect an index to be useless if DISTINCT_KEYS was a very low percentage of NUM_ROWS.
Suppose you have a query:
SELECT column_x
FROM table_y
WHERE column_z = :some_value
Suppose the index on column_z shows DISTINCT_KEYS = 999999 and NUM_ROWS = 1000000.
That means, on average, each distinct key has (very) slightly more than one row. That makes the index very selective and very useful. When our query runs, we will use the index to pull out only one row of the table very quickly.
Suppose, instead, the index on column_z shows DISTINCT_KEYS = 2 and NUM_ROWS = 1000000. Now, each distinct key has an average of 500,000 rows. This index is worthless because we have to read each half of the blocks from the index and then still probably wind up reading at last half of the blocks from the table (probably way more than half). Worse, these reads are all single block reads. It would be way, way faster for Oracle to ignore the index and do a full table scan -- fewer blocks in total to read and all the reads are multi-block reads (e.g., 8 at a time).
For completeness, I'll point out that an index with DISTINCT_KEYS = 2 and NUM_ROWS = 1000000 could still be useful if the data is very skewed. That is, for example, if one distinct key had 999,000 rows and the other distinct key had only 1,000 rows. The index would be useful for finding the rows of that other (smaller) distinct key. Oracle gathers histograms as part of its statistics to keep track of which columns have skewed data and, if so, how many rows there are for each distinct key. (Over-simplification).
TL;DR It's very likely a good index and no more likely to be "unneeded" than any other index in your system.
I have this query on the order lines table. Its a fairly large table. I am trying to get quantity shipped by item in the last 365 days. The query works, but is very slow to return results. Should I use a function based index for this? I read a bit about them, but havent work with them much at all.
How can I make this query faster?
select OOL.INVENTORY_ITEM_ID
,SUM(nvl(OOL.shipped_QUANTITY,0)) shipped_QUANTITY_Last_365
from oe_order_lines_all OOL
where ool.actual_shipment_date>=trunc(sysdate)-365
and cancelled_flag='N'
and fulfilled_flag='Y'
group by ool.inventory_item_id;
Explain plan:
Stats are up to date, we regather once a week.
Query taking 30+ minutes to finish.
UPDATE
After adding this index:
The explain plan shows the query is using index now:
The query runs faster but not 'fast.' Completing in about 6 minutes.
UPDATE2
I created a covering index as suggested by Matthew and Gordon:
The query now completes in less than 1 second.
Explain Plan:
I still wonder why or if a function-based index would have also been a viable solution, but I dont have time to play with it right now.
As a rule, using an index that access a "significant" percentage of the rows in your table is slower than a full table scan. Depending on your system, "significant" could be as low as 5% or 10%.
So, think about your data for a minute...
How many rows in OE_ORDER_LINES_ALL are cancelled? (Hopefully not many...)
How many rows are fulfilled? (Hopefully almost all of them...)
How many rows where shipped in the last year? (Unless you have more than 10 years of history in your table, more than 10% of them...)
Put that all together and your query is probably going to have to read at least 10% of the rows in your table. This is very near the threshold where an index is going to be worse than a full table scan (or, at least not much better than one).
Now, if you need to run this query a lot, you have a few options.
Materialized view, possibly for the prior 11 months together with a live query against OE_ORDER_LINES_ALL for the current month-to-date.
A covering index (see below).
You can improve the performance of an index, even one accessing a significant percentage of the table rows, by making it include all the information required by the query -- allowing Oracle to avoid accessing the table at all.
CREATE INDEX idx1 ON OE_ORDER_LINES_ALL
( actual_shipment_date,
cancelled_flag,
fulfilled_flag,
inventory_item_id,
shipped_quantity ) ONLINE;
With an index like that, Oracle can satisfy the query by just reading the index (which is faster because it's much smaller than the table).
For this query:
select OOL.INVENTORY_ITEM_ID,
SUM(OOL.shipped_QUANTITY) as shipped_QUANTITY_Last_365
from oe_order_lines_all OOL
where ool.actual_shipment_date >= trunc(sysdate) - 365 and
cancelled_flag = 'N' and
fulfilled_flag = 'Y'
group by ool.inventory_item_id;
I would recommend starting with an index on oe_order_lines_all(cancelled_flag, fulfilled_flag, actual_shipment_date). That should do a good job in identifying the rows.
You can add the additional columns inventory_item_id and quantity_shipped to the index as well.
Let recapitulate the facts:
a) You access about 300K rows from your table (see cardinality in the 3rd line of the execution plan)
b) you use the FULL TABLE SCAN the get the data
c) the query is very slow
The first thing is to check why is the FULL TABLE SCAN so very slow - if the table is extremly large (check the BYTES in user_segments) you need to optimize the access to your data.
But remember no index will help you the get 300K rows from say 30M total rows.
Index access to 300K rows can take 1/4 of an hour or even more if th eindex is not much used and a large part of it s on the disk.
What you need is partitioning - in your case a range partitioning on actual_shipment_date - for your data size on a monthly or yearly basis.
This will eliminate the need of scaning the old data (partition pruning) and make the query much more effective.
Other possibility - if the number of rows is small, but the table size is very large - you need to reorganize the table to get better full scan time.
I've just came across some weird performance differences.
I have two selects:
SELECT s.dwh_end_date,
t.*,
'-1' as PROMOTION_DROP_EMP_CODE,
trunc(sysdate +1) as PROMOTION_END_DATE,
'K01' as PROMOTION_DROP_REASON,
-1 as PROMOTION_DROP_WO_NUMBER
FROM STG_PROMO_EXPIRE_DATE t
INNER JOIN fct_customer_services s
ON(t.dwh_product_key = s.dwh_product_key)
Which takes approximately 20 seconds.
And this one:
SELECT s.dwh_end_date,
s.dwh_product_key,
s.promotion_expire_date,
s.PROMOTION_DROP_EMP_CODE,
s.PROMOTION_END_DATE,
s.PROMOTION_DROP_REASON,
s.PROMOTION_DROP_WO_NUMBER
FROM STG_PROMO_EXPIRE_DATE t
INNER JOIN fct_customer_services s
ON(t.dwh_product_key = s.dwh_product_key)
That takes approximately 400 seconds
They are basically the same - its just to assure that I've updated my data correct (first select is to update the FCT tables) second select to make sure every thing updated correctly.
The only differences between this two selects, is the columns I select. (STG table has two columns - dwh_p_key and prom_expire_date)
First select explain plan
Second select explain plan
What can cause this weird behaviour?..
The FCT tables is indexed UNIQUE (dwh_product_key, dwh_end_date) and partitioned by dwh_end_date (250 million records), the STG doesn't have any indexes (and its only 15k records)
Thanks in advance.
The plans are not exactly the same. The first query uses a fast full scan of the index on fct_customer_services and doesn't need to access any blocks from the actual table, since you only refer to the two indexed columns.
The second query does have to access the table blocks to get the other unidexed column values. It's doing a full table scan - slower and more expensive than a full index scan. The optimiser doesn't see any improvement from using the index and then accessing specific table rows, presumably because the cardinality is too high - it needs to access too many table rows to save any effort by hitting the index first. Doing so would be even slower.
So the second query is slower because it has to read the whole table from disk/cache rather than just the whole index, and the table is much larger than the index. You can look at the segments assigned to both objects (index and table) to see the ratio of their sizes.
SQL Fiddle: http://sqlfiddle.com/#!3/23cf8
In this query, when I have an In clause on an Id, and then also select other columns, the In is evaluated first, and then the Details column and other columns are pulled in via a RID Lookup:
--In production and in SQL Fiddle, Details is grabbed via a RID Lookup after the In clause is evaluated
SELECT [Id]
,[ForeignId]
,Details
--Generate a numbering(starting at 1)
--,Row_Number() Over(Partition By ForeignId Order By Id Desc) as ContactNumber --Desc because older posts should be numbered last
FROM SupportContacts
Where foreignId In (1,2,3,5)
With this query, the Details are being pulled in via a Table Scan.
With NumberedContacts AS
(
SELECT [Id]
,[ForeignId]
--Generate a numbering(starting at 1)
,Row_Number() Over(Partition By ForeignId Order By Id Desc) as ContactNumber --Desc because older posts should be numbered last
FROM SupportContacts
Where ForeignId In (1,2,3,5)
)
Select nc.[Id]
,nc.[ForeignId]
,sc.[Details]
From NumberedContacts nc
Inner Join SupportContacts sc on nc.Id = sc.Id
Where nc.ContactNumber <= 2 --Only grab the last 2 contacts per ForeignId
;
In SqlFiddle, the second query actually gets a RID Lookup, whereas in production with a million records it produces a Table Scan (the IN clause eliminates 99% of the rows)
Otherwise the query plan shown in SQL Fiddle is identical, the only difference being that for the second query the RID Lookup in SQL Fiddle, is a Table Scan in production :(
I would like to understand possibilities that would cause this behavior? What kinds of things would you look at to help determine the cause of it using a table scan here?
How can I influence it to use a RID Lookup there?
From looking at operation costs in the actual execution plan, I believe I can get the second query very close in performance to the first query if I can get it to use a RID Lookup. If I don't select the Detail column, then the performance of both queries is very close in production. It is only after adding other columns like Detail that performance degrades significantly for the second query. When I put it in SQL Fiddle and saw that the execution plan used an RID Lookup, I was surprised but slightly confused...
It doesn't have a clustered index because in testing with different clustered indexes, there was slightly worse performance for this and other queries. That was before I began adding other columns like Details though, and I can experiment with that more, but would like to have a understanding of what is going on now before I start shooting in the dark with random indexes.
What if you would change your main index to include the Details column?
If you use:
CREATE NONCLUSTERED INDEX [IX_SupportContacts_ForeignIdAsc_IdDesc]
ON SupportContacts ([ForeignId] ASC, [Id] DESC)
INCLUDE (Details);
then neither a RID lookup nor a table scan would be needed, since your query could be satisfied from just the index itself....
The differences in the query plans will be dependent on the types of indexes that exist and the statistics of the data for those tables in the different environments.
The optimiser uses the statistics (histograms of data frequency, mostly) and the available indexes to decide which execution plan is going to be the quickest.
So, for example, you have noticed that the performance degrades when the 'Details' column is included. This is an almost sure sign that either the 'Details' column is not part of an index, or if it is part of an index, the data in that column is mostly unique such that the index accesses would be equivalent (or almost equivalent) to a table scan.
Often when this situation arises, the optimiser will choose a table scan over the index access, as it can take advantage of things like block reads to access the table records faster than perhaps a fragmented read of an index.
To influence the path that will be chose by the optimiser, you would need to look at possible indexes that could be added/modified to make an index access more efficient, but this should be done with care as it can adversely affect other queries as well as possibly degrading insert performance.
The other important activity you can do to help the optimiser is to make sure the table statistics are kept up to date and refreshed at a frequency that is appropriate to the rate of change of the frequency distribution in the table data
If it's true that 99% of the rows would be omitted if it performed the query using the relevant index + RID then the likeliest problem in your production environment is that your statistics are out of date and the optimiser doesn't realise that ForeignID in (1,2,3,5) would limit the result set to 1% of the total data.
Here's a good link for discovering more about statistics from Pinal Dave: http://blog.sqlauthority.com/2010/01/25/sql-server-find-statistics-update-date-update-statistics/
As for forcing the optimiser to follow the correct path WITHOUT updating the statistics, you could use a table hint - if you know the index that your plan should be using which contains the ID and ForeignID columns then stick that in your query as a hint and force SQL optimiser to use the index:
http://msdn.microsoft.com/en-us/library/ms187373.aspx
FYI, if you want the best performance from your second query, use this index and avoid the headache you're experiencing altogether:
create index ix1 on SupportContacts(ForeignID, Id DESC) include (Details);