Issues with Oracle Query execution time - sql

Below is my query, I use four joins to access data from three different tables, now when searching for 1000 records it takes around 5.5 seconds, but when I amp it up to 100,000 it takes what seems like an infinite amount of time, (last cancelled at 7 hours..)
Does anyone have any idea of what I am doing wrong? Or what could be done to speed up the query?
This query will proabably end up having to be run to return millions of records, I've only limited it to 100,000 for the purpose of testing the query and it seems to fall over at even this small amount.
For the record im on oracle 8
CREATE TABLE co_tenancyind_batch01 AS
SELECT /*+ CHOOSE */ ou_num,
x_addr_relat,
x_mastership_flag,
x_ten_3rd_party_source
FROM s_org_ext,
s_con_addr,
s_per_org_unit,
s_contact
WHERE s_org_ext.row_id = s_con_addr.accnt_id
AND s_org_ext.row_id = s_per_org_unit.ou_id
AND s_per_org_unit.per_id = s_contact.row_id
AND x_addr_relat IS NOT NULL
AND rownum < 100000
Explain Plan in Picture : http://imgur.com/Xw9x4BA (easy to read)

Your test based on 100,000 rows is not meaningful if you are then going to run it for many millions. The optimiser knows that it can satisfy the query faster when it has a stopkey by using nested loop joins.
When you run it for a very large data set you're likely to need a different plan, with hash joins most likely. Covering indexes might help with that, but we can't tell because the selected columns are missing column aliases that tell us which table they come from. You're most likely to hit memory problems with large hash joins, which could be ameliorated with hash partitioning but there's no way the Siebel people would go for that -- you'll have to use manual memory management and monitor v$sql_workarea to see how much you really need.
(Hate the visual explain plan, by the way).

First of all, can you make sure there is an index on S_CONTACT table and it is enabled ?
If it is so, try the select statement with /*+ CHOOSE */ hint and have another look at the explain plan to see if optimizer mode is still RULE. I believe cost based optimizer would result better in this query.
If still rule try updating database statistics and try again. You can use DBMS_STATS package for that purpose, if i am not wrong it was introduced with version 8i. Are you using 8i ?
And at last, i don't know the record numbers, the cardinality between tables. I might have been more helpful if i knew the design.

Your dataset, looking at the last execution plan appear to be huge, you could limit access to the base table instead of limiting the number of returned row, like this:
CREATE TABLE co_tenancyind_batch01 AS
SELECT /*+ CHOOSE */ ou_num,
x_addr_relat,
x_mastership_flag,
x_ten_3rd_party_source
FROM s_org_ext,
s_con_addr,
s_per_org_unit,
(select * from s_contact where rownum <= 100000) cont
WHERE s_org_ext.row_id = s_con_addr.accnt_id
AND s_org_ext.row_id = s_per_org_unit.ou_id
AND s_per_org_unit.per_id = cont.row_id
AND x_addr_relat IS NOT NULL
should improve but not be extremely quick.

Related

Azure SQL Query Performance Issue

I'm running a query against an Azure SQL DB...
select Id
from Table1
WHERE ([Table1].[CustomFieldString2] IS NULL) AND
(N'New' = [Table1].[CustomFieldString7]) AND (0 = [Table1].[Deleted])
This query runs fast roughly 300ms...
As soon as I add another column to my select (bool) as in
Select Id, IsActive
my query is super slow (minutes)
This doesn't make any sense...
Was wondering if anyone knew what this could be
In Summary, when you add columns which are not part of index to the select then SQL can't choose the same execution plan.
If SQL estimates there are fewer rows, then it will opt to use nested lookups in the execution plan. This can take more time, if estimates are wrong.
If there are more rows or key lookup cost crosses some threshold, SQL may then decide that a scan of the table is likely to be more efficient.
Try adding isactive to the included column list, if the query performance is not acceptable.
You query constructure is important of course but Azure is naturally slow. Is is using cloud systems so it is not so fast (I supposed using free version.) ı have not seen anyone pleasure about azure velocity. (in low prices)

Does using the TOP X * format in SQL speed up queries significantly?

So lately when I run queries on huge tables I'll use the the top 10 * notation like so:
select top 10 * from BI_Sessions (nolock)
where SessionSID like 'b6d%'
and CreateDate between '03-15-2012' AND '05-18-2012'
I thought that it let's it run faster, but it doesn't seem so , this one took 4 minutes(or is that OK time)?
I guess I'm curious about whether the top functionality happens after it pulls all the data anyway(which would seem like it's inefficient).
thanks
It entirely depends on the query, with the exceptino of "Top 0". "Top 0" does return much faster.
In your case, the query has to look through the rows in a huge table to find rows that match the WHERE clause. If no rows are found, the number of rows being returned doesn't help. If the rows are at the end of the table scan, then the number of rows being returned doesn't help.
There are certain cases with more complicated queries where the "top" could affect performance. There is a difference between optimizing overall and for the first row returned. I'm not sure if SQL Server's optimizer recognizes this difference.
Well, it depends. If you do not have a covering index on BI_sessions and its a large database then the answer is probably. A good covering index may be something like: CreateDate, SessionSIS, and all the columns you actually need to return. If you do have a coveing index, then SQL will not even read the table, it will get all the data it needs from the covering index. Possibly if you specified the columns you actually need to return, 10 rows should come back in a fraction of a second.
for more useful info
http://www.mssqltips.com/sqlservertip/1078/improve-sql-server-performance-with-covering-index-enhancements/
and a bit more technical:
http://www.simple-talk.com/sql/learn-sql-server/using-covering-indexes-to-improve-query-performance/
also
http://www.sqlserverinternals.com/
and
http://www.insidesqlserver.com/thebooks.html

Speed of paged queries in Oracle

This is a never-ending topic for me and I'm wondering if I might be overlooking something. Essentially I use two types of SQL statements in an application:
Regular queries with a "fallback" limit
Sorted and paged queries
Now, we're talking about some queries against tables with several million records, joined to 5 more tables with several million records. Clearly, we hardly want to fetch all of them, that's why we have the above two methods to limit user queries.
Case 1 is really simple. We just add an additional ROWNUM filter:
WHERE ...
AND ROWNUM < ?
That's quite fast, as Oracle's CBO will take this filter into consideration for its execution plan and probably apply a FIRST_ROWS operation (similar to the one enforced by the /*+FIRST_ROWS*/ hint.
Case 2, however is a bit more tricky with Oracle, as there is no LIMIT ... OFFSET clause as in other RDBMS. So we nest our "business" query in a technical wrapper as such:
SELECT outer.* FROM (
SELECT * FROM (
SELECT inner.*, ROWNUM as RNUM, MAX(ROWNUM) OVER(PARTITION BY 1) as TOTAL_ROWS
FROM (
[... USER SORTED business query ...]
) inner
)
WHERE ROWNUM < ?
) outer
WHERE outer.RNUM > ?
Note that the TOTAL_ROWS field is calculated to know how many pages we will have even without fetching all data. Now this paging query is usually quite satisfying. But every now and then (as I said, when querying 5M+ records, possibly including non-indexed searches), this runs for 2-3minutes.
EDIT: Please note, that a potential bottleneck is not so easy to circumvent, because of sorting that has to be applied before paging!
I'm wondering, is that state-of-the-art simulation of LIMIT ... OFFSET, including TOTAL_ROWS in Oracle, or is there a better solution that will be faster by design, e.g. by using the ROW_NUMBER() window function instead of the ROWNUM pseudo-column?
The main problem with Case 2 is that in many cases the whole query result set has to be obtained and then sorted before the first N rows can be returned - unless the ORDER BY columns are indexed and Oracle can use the index to avoid a sort. For a complex query and a large set of data this can take some time. However there may be some things you can do to improve the speed:
Try to ensure that no functions are called in the inner SQL - these may get called 5 million times just to return the first 20 rows. If you can move these function calls to the outer query they will be called less.
Use a FIRST_ROWS_n hint to nudge Oracle into optimising for the fact that you will never return all the data.
EDIT:
Another thought: you are currently presenting the user with a report that could return thousands or millions of rows, but the user is never realistically going to page through them all. Can you not force them to select a smaller amount of data e.g. by limiting the date range selected to 3 months (or whatever)?
You might want to trace the query that takes a lot of time and look at its explain plan. Most likely the performance bottleneck comes from the TOTAL_ROWS calculation. Oracle has to read all the data, even if you only fetch one row, this is a common problem that all RDBMS face with this type of query. No implementation of TOTAL_ROWS will get around that.
The radical way to speed up this type of query is to forego the TOTAL_ROWS calculation. Just display that there are additional pages. Do your users really need to know that they can page through 52486 pages? An estimation may be sufficient. That's another solution, implemented by google search for example: estimate the number of pages instead of actually counting them.
Designing an accurate and efficient estimation algorithm might not be trivial.
A "LIMIT ... OFFSET" is pretty much syntactic sugar. It might make the query look prettier, but if you still need to read the whole of a data set and sort it and get rows "50-60", then that's the work that has to be done.
If you have an index in the right order, then that can help.
It may perform better to run two queries instead of trying to count() and return the results in the same query. Oracle may be able to answer the count() without any sorting or joining to all the tables (join table elimination based on declared foreign key constraints). This is what we generally do in our application. For performance important statements, we write a separate query that we know will return the correct count as we can sometimes do better than Oracle.
Alternatively, you can make a tradeoff between performance and recency of the data. Bringing back the first 5 pages is going to be nearly as quick as bringing back the first page. So you could consider storing the results from 5 pages in a temporary table along with an expiry date for the information. Take the result from the temporary table if valid. Put a background task in to delete the expired data periodically.

SQL Server: How can a table scan be so expensive on a tiny table?

I'm looking at an execution plan from a troublesome query.
I can see that 45% of the plan is taken up doing a table scan on a table with seven (7) rows of data.
I am about to put a clustered index to cover the columns in my query on a table with seven rows and it feels...wrong. How can this part of my query take up so much of the plan given the table is so tiny?
I was reading up here and it feel it might just be becuase of non-contiguous data - there are no indexes at all on the table in question. Overall though our database is large-ish (7GB) and busy.
I'd love to know what others think - thanks!
EDIT:
The query is run very frequently and was involved in deadlock (and chosen as the victim). Right now it's taking between 300ms and 500ms to run, but will take longer when the database is busier.
The query:
select l.team1Score, l.team2Score, ls.team1ExternalID, ls.team2ExternalID, et.eventCategoryID, e.eventID, ls.statusCode
from livescoretracking l(nolock)
inner join liveScores ls (nolock) on l.liveScoreID = ls.liveScoreID
inner join db1.dbo.events e on e.gameid = ls.gameid
inner join db1.dbo.eventtype et (nolock) on e.eventTypeID = et.eventTypeID
inner join eventCategoryPayTypeMappings ecb (nolock) on ( et.eventCategoryID = ecb.eventCategoryID and e.payTypeID = ecb.payTypeID and ecb.mainEvent = 1 )
where ls.gameID = 286711 order by l.dateinserted
The problem table is the eventCategoryPayTypeMappings table - thanks!
A percentage cost is meaningless without knowing the total cost in real terms. e.g. if the query takes 1 ms to execute a 45% cost for a table scan is .45 of a milisecond which is not worth trying to optimise, if the query takes 10 seconds to execute then the 45% cost is significant and worth optimising.
A table scan on a seven row table is not expensive. Barring query hints, the query engine will use a table scan on such a small table no matter what indexes exist. Can you show us more about the query in question and the problem with the execution plan?
If there are no indexes on the table, the query engine will always have to do a table scan. There's no other way it can process the data.
Many RDBMS platforms will do a table scan on a table that small even if there are indexes. (I'm not sure about SQL Server specifically.)
I would be more concerned about the actual numbers in the query plan.
Deadlocks are usually more indicative of a resource access ordering issue than a problem with query design in particular. I would look at the other participant(s) in the deadlock and take a look at what objects each transaction had locked that were required by the other(s). If you can reorder to ensure consistent access order you may be able to avoid contention issues entirely.
It really depends how long the query takes from start to finish. 45% doesn't mean its taking a long time if the query is only taking say 10ms. All it really says is most of the time is spent doing the table scan which is understandable.
Having an index may help when the table grows and is probably not a bad idea unless you know this table is not going to grow. However you will find that adding an index to a table with 7 records makes little to no difference to performance.
A table scan on a small table is not a bad thing - If it fits in a single read into the cache the optimizer will calculate that a table scan costs less than reading through an index chain.
I would only recommend a clustered index if you want to help insure that the contents will 'tend' to be sorted that way (though you will need an explicit order by to guarantee that).

Should I use Query Hint Fast number_rows / FASTFIRSTROW?

I was reading over the documentation for query hints:
http://msdn.microsoft.com/en-us/library/ms181714(SQL.90).aspx
And noticed this:
FAST number_rows
Specifies that the query is optimized for fast retrieval of the first number_rows. This is a nonnegative integer. After the first number_rows are returned, the query continues execution and produces its full result set.
So when I'm doing a query like:
Select Name from Students where ID = 444
Should I bother with a hint like this? Assuming SQL Server 2005, when should I?
-- edit --
Also should one bother when limiting results:
Select top 10 * from Students OPTION (FAST 10)
The FAST hint only makes sense on complex queries where there are multiple alternatives the optimizer could choose from. For a simple query like your example it doesn't help with anything, the query optimizer will immediately determine that there is a trivial plan (seek in ID index, lookup Name if not covering) to satisfy the query and go for it. Even if no index exists on ID, the plan is still trivial (probably clustered scan).
To give an example where FAST would be useful consider a join between A and B, with an ORDER BY constraint. Say evaluating the join B first and nested loops A honors the ORDER BY constraint, so will produce fast results (no SORT necessary), but is more costly because of cardinality (B has many records that match the WHERE, while A has few). On the other hand evaluating B first and nested loop A would produce a query that does less IO hence is faster overall, but the result would have to be sorted first and SORT can only start after the join is evaluated, so the first result will come very late. The optimizer would normally pick the second plan because is more efficient overall. The FAST hint would cause the optimizer to pick the first plan, because it produces results faster.
When using TOP x, there's no benefit of also using OPTION FAST x. The query optimizer already makes its decisions based on how many rows you are retrieving. Same goes for trivial queries, such as querying for a particular value from a unique index.
Other than that, OPTION FAST x could help when you know the number of results is likely below x, but the query optimizer does not. Of course, if the query optimizer is choosing poor paths for complex queries with few results, your statistics may need to be updated. And if you guess wrong on x, the query may end up taking longer--almost always a risk when giving hints.
The above statement has not been tested--it may be that all queries take just as long to fully execute, if not longer. Getting the first 10 rows fast is great if there are only 8 rows, but theoretically the query still has to execute fully before finishing. The benefit I'm thinking may be there because the query execution takes a different path expecting fewer total records, when in fact it's really trying to get the first x faster. Those two types of optimizations may not be in alignment.
For that particular query, certainly not! It's only going to return one row — the row with ID = 444. SQL Server will select that row as efficiently as it can.
FAST 10 might be used in a situation where you could make use of the first 10 rows immediately, even as you continue to wait for further results.