I have a table with 226 million rows that has a varchar2(2000) column. The first 10 characters are indexed using a functional index SUBSTR("txtField",1,10).
I am running a query such as this:
select count(1)
from myTable
where SUBSTR("txtField",1,10) = 'ABCDEFGHIJ';
The value does not exist in the database so the return in "0".
The explain plan shows that the operation performed is "INDEX (RANGE SCAN)" which I would assume and the cost is 4. When I run this query it takes on average 114 seconds.
If I change the query and force it to not use the index:
select count(1)
from myTable
where SUBSTR("txtField",1,9) = 'ABCDEFGHI';
The explain plan shows the operation will be a "TABLE ACCESS (FULL)" which makes sense. The cost is 629,000. When I run this query it takes on average 103 seconds.
I am trying to understand how scanning an index can take longer than reading every record in the table and performing the substr function on a field.
Followup:
There are 230M+ rows in the table and the query returns 17 rows; I selected a new value that is in the database. Initially I was executing with a value that was not in the database and returned zero rows. It seems to make no difference.
Querying for information on the index yields:
CLUSTERING_FACTOR=201808147
LEAF_BLOCKS=1131660
I am running the query with AUTOTRACE ON and the gather_plan_statistics and will add those results when they are available.
Thanks for all the suggestions.
There's a lot of possibilities.
You need to look at the actual execution plan, though.
You can run the query with the /*+ gather_plan_statistics */ hint, and then execute:
select * from table(dbms_xplan.display_cursor(null, null, 'ALLSTATS LAST'));
You should also look into running a trace/tkprof to see what is actually happening - your DBA should be able to assist you with this.
Related
I am running a query which is selecting data on the basis of joins between 6-7 tables. When I execute the query it is taking 3-4 seconds to complete. But when I put a where clause on the fetched data it's taking more than one minute to execute. My query is fetching large amounts of data so I can't write it here but the situation I faced is explained below:
Select Category,x,y,z
from
(
---Sample Query
) as a
it's only taking 3-4 seconds to execute. But
Select Category,x,y,z
from
(
---Sample Query
) as a
where category Like 'Spart%'
is taking more than 2-3 minutes to execute.
Why is it taking more time to execute when I use the where clause?
It's impossible to say exactly what the issue is without seeing the full query. It is likely that the optimiser is pushing the WHERE into the "Sample query" in a way that is not performant. Possibly could be resolved by updating statistics on the table, but an easier option would be to insert the whole query into a temporary table, and filter from there.
Select Category,x,y,z
INTO #temp
from
(
---Sample Query
) as a
SELECT * FROM #temp WHERE category Like 'Spart%'
This will force the optimiser to tackle it in the logical order of pulling your data together before applying the WHERE to the end result. You might like to consider indexing the temp table's category field also.
If you're using MS SQL by checking the management studio actual execution plan it may already suggest an index creation
In any case, you should add to the index used by the query the column "Category"
If you don't have an index on that table create it composed by column "Category" and all the other columns used in join or where
bear in mind by using like 'text%' clause you could end in index scan and not index seek
i need advice how to get fastest result for querying on big size table.
I am using SQL Server 2012, my condition is like this:
I have 5 tables contains transaction record, each table has 35 millions of records.
All tables has 14 columns, the columns i need to search is GroupName, CustomerName, and NoRegistration. And I have a view that contains 5 of all these tables.
The GroupName, CustomerName, and NoRegistration records is not unique each tables.
My application have a function to search to these column.
The query is like this:
Search by Group Name:
SELECT DISTINCT(GroupName) FROM TransactionRecords_view WHERE GroupName LIKE ''+#GroupName+'%'
Search by Name:
SELECT DISTINCT(CustomerName) AS 'CustomerName' FROM TransactionRecords_view WHERE CustomerName LIKE ''+#Name+'%'
Search by NoRegistration:
SELECT DISTINCT(NoRegistration) FROM TransactionRecords_view WHERE LOWER(NoRegistration) LIKE LOWER(#NoRegistration)+'%'
My question is how can i achieve fastest execution time for searching?
With my condition right now, every time i search, it took 3 to 5 minutes.
My idea is to make a new tables contains the distinct of GroupName, CustomerName, and NoRegistration from all 5 tables.
Is my idea is make execution time is faster? or any other idea?
Thank you
EDIT:
This is query for view "TransactionRecords_view"
CREATE VIEW TransactionRecords_view
AS
SELECT * FROM TransactionRecords_1507
UNION ALL
SELECT * FROM TransactionRecords_1506
UNION ALL
SELECT * FROM TransactionRecords_1505
UNION ALL
SELECT * FROM TransactionRecords_1504
UNION ALL
SELECT * FROM TransactionRecords_1503
You must show sql of TransactionRecords_view. Do you have indexes? What is the collation of NoRegistration column? Paste the Actual Execution Plan for each query.
Ok, so you don't need to make those new tables. If you create Non-Clustered indexes based upon these fields it will (in effect) do what you're after. The index will only store data on the columns that you indicate, not the whole table. Be aware, however, that indexes are excellent to aid in SELECT statements but will negatively affect any write statements (INSERT, UPDATE etc).
Next you want to run the queries with the actual execution plan switched on. This will show you how the optimizer has decided to run each query (in the back end). Are there any particular issues here, are any of the steps taking up a lot of the overall operator cost? There are plenty of great instructional videos about execution plans on youtube, check them out if you haven't looked at exe plans before.
Did you try to check if there were missing indexes with the actual execution plan ?
Moreover, as you use clause on varchar, I've heard about Full-Text Search.. maybe it can be useful for you :
https://msdn.microsoft.com/en-us/library/ms142571(v=sql.120).aspx
I'm trying to perform the following query in Oracle.
select * from (select rownum r, account from fooTable) where r<5001;
It selects the 1st 5000 rows. I'm running into a problem that fooTable has a lot of data inside of it and this is really slowing down the query (35 million+ rows). According to the query analyzer it's performing a full table scan.
My question is, is there a way to speed up this statement? Since I'm only fetching the 1st N rows, is the full table scan necessary?
mj
I have found the /*+ FIRST_ROWS(n) */ hint to be very helpful in cases like this (such as for limiting pagination results). You replace n with whatever value you want.
select /*+ FIRST_ROWS(5000) */
account
from fooTable
where rownum <5000;
You still need the rownum predicate to limit rows, but the hint lets the optimizer know you only need a lazy fetch of n rows.
How to reduce the execution time of this simple SQL query in SQL Server?
select *
from companybackup
where tiRecordStatus = 1
and (iAccessCode < 3 or chUpdateBy = SUSER_SNAME())
It has nearly 38,681 rows and taking nearly 10 minutes 23 seconds. The table is having 50 columns and even I created indexes on all columns to reduce the time but it didn't succeed, even I checked with nolock option and all the available solutions but couldn't reduce the execution time.
What might be the issue?
If this is a critical query, you can create an index designed just for it:
CREATE INDEX ix_MyIndexNameHere ON CompanyBackup
(tiRecordStatus, iAccesscode, chUpdateBy)
This will still require a key lookup since you are returning all fields with *, but it's a lot better than a bunch of single-field indexes.
I am wondering what does the SUSER_SNAME() function do, now it gets executed on every row. Can you try getting that function return value to parameter and then execute the query?
If the function is costly this will reduce execution time by a huge margin.
edit. Can anyone tell me why I can not post some SQL clauses and some I can? Cant post DECLARE syntax.
You haven't given much info, but splitting OR into a union of two queries can often help (optimisers have trouble with OR):
select *
from companybackup
where tiRecordStatus = 1
and iAccessCode < 3
union
select *
from companybackup
where tiRecordStatus = 1
and chUpdateBy = SUSER_SNAME())
If this doesn't help, try creating these indexes:
create index idx1 on companybackup(tiRecordStatus, iAccessCode);
create index idx2 on companybackup(chUpdateBy, tiRecordStatus);
Then force a recalculation if the distribution of index values:
update statistics companybackup;
I'm selecting some rows from a table valued function but have found an inexplicable massive performance difference by putting SELECT TOP in the query.
SELECT col1, col2, col3 etc
FROM dbo.some_table_function
WHERE col1 = #parameter
--ORDER BY col1
is taking upwards of 5 or 6 mins to complete.
However
SELECT TOP 6000 col1, col2, col3 etc
FROM dbo.some_table_function
WHERE col1 = #parameter
--ORDER BY col1
completes in about 4 or 5 seconds.
This wouldn't surprise me if the returned set of data were huge, but the particular query involved returns ~5000 rows out of 200,000.
So in both cases, the whole of the table is processed, as SQL Server continues to the end in search of 6000 rows which it will never get to. Why the massive difference then? Is this something to do with the way SQL Server allocates space in anticipation of the result set size (the TOP 6000 thereby giving it a low requirement which is more easily allocated in memory)?
Has anyone else witnessed something like this?
Thanks
Table valued functions can have a non-linear execution time.
Let's consider function equivalent for this query:
SELECT (
SELECT SUM(mi.value)
FROM mytable mi
WHERE mi.id <= mo.id
)
FROM mytable mo
ORDER BY
mo.value
This query (that calculates the running SUM) is fast at the beginning and slow at the end, since on each row from mo it should sum all the preceding values which requires rewinding the rowsource.
Time taken to calculate SUM for each row increases as the row numbers increase.
If you make mytable large enough (say, 100,000 rows, as in your example) and run this query you will see that it takes considerable time.
However, if you apply TOP 5000 to this query you will see that it completes much faster than 1/20 of the time required for the full table.
Most probably, something similar happens in your case too.
To say something more definitely, I need to see the function definition.
Update:
SQL Server can push predicates into the function.
For instance, I just created this TVF:
CREATE FUNCTION fn_test()
RETURNS TABLE
AS
RETURN (
SELECT *
FROM master
);
These queries:
SELECT *
FROM fn_test()
WHERE name = #name
SELECT TOP 1000 *
FROM fn_test()
WHERE name = #name
yield different execution plans (the first one uses clustered scan, the second one uses an index seek with a TOP)
I had the same problem, a simple query joining five tables returning 1000 rows took two minutes to complete. When I added "TOP 10000" to it it completed in less than one second. It turned out that the clustered index on one of the tables was heavily fragmented.
After rebuilding the index the query now completes in less than a second.
Your TOP has no ORDER BY, so it's simply the same as SET ROWCOUNT 6000 first. An ORDER BY would require all rows to be evaluated first, and it's would take a lot longer.
If dbo.some_table_function is a inline table valued udf, then it's simply a macro that's expanded so it returns the first 6000 rows as mentioned in no particular order.
If the udf is multi valued, then it's a black box and will always pull in the full dataset before filtering. I don't think this is happening.
Not directly related, but another SO question on TVFs
You may be running into something as simple as caching here - perhaps (for whatever reason) the "TOP" query is cached? Using an index that the other isn't?
In any case the best way to quench your curiosity is to examine the full execution plan for both queries. You can do this right in SQL Management Console and it'll tell you EXACTLY what operations are being completed and how long each is predicted to take.
All SQL implementations are quirky in their own way - SQL Server's no exception. These kind of "whaaaaaa?!" moments are pretty common. ;^)
It's not necessarily true that the whole table is processed if col1 has an index.
The SQL optimization will choose whether or not to use an index. Perhaps your "TOP" is forcing it to use the index.
If you are using the MSSQL Query Analyzer (The name escapes me) hit Ctrl-K. This will show the execution plan for the query instead of executing it. Mousing over the icons will show the IO/CPU usage, I believe.
I bet one is using an index seek, while the other isn't.
If you have a generic client:
SET SHOWPLAN_ALL ON;
GO
select ...;
go
see http://msdn.microsoft.com/en-us/library/ms187735.aspx for details.
I think Quassnois' suggestion seems very plausible. By adding TOP 6000 you are implicitly giving the optimizer a hint that a fairly small subset of the 200,000 rows are going to be returned. The optimizer then uses an index seek instead of an clustered index scan or table scan.
Another possible explanation could caching, as Jim davis suggests. This is fairly easy to rule out by running the queries again. Try running the one with TOP 6000 first.