select *
from customers
where column1 = 'test'
limit 5;
I just need 5 records. will execution engine stop running after finding 5 records which matches condition.
I am working on table with millions of records simple select statement with limit is taking ~20 minutes.
Can I improve the performance of this query?
Make sure that you have an index on column1. If not, then the engine has to scan ALL records starting with the first one until it finds 5 matching records. If you know that more than this single column will match your desired rows and also exclude other rows, you could create a compound index consisting of more than one row. You could also consider partitioning your table.
Related
I am trying to fetch first 5 distinct rows using the query SELECT DISTINCT col_A from table_name fetch first 5 rows only;
But the table has millions of rows so using distinct is expensive as it scans the whole table for only 5 rows taking a lot of time, around 200 seconds for me.
Is there a workaround or subquery for this?
The problem here is you don't know if a row is unique unless you check all the other rows. I believe your only solution would be to index the rows such that non-distinct rows are indexed together. That might buy you some efficiency when searching, however it will cost you when inserting data.
I have an strange behavior on an oracle database. We make a huge insert of around 3.1 million records. Everything fine so far.
Shortly after the insert finished (around 1 too 10 minutes) I execute two statements.
SELECT COUNT(*) FROM TABLE
SELECT * FROM TABLE
The result from the first statement is fine it gives me the exact number of rows that was inserted.
The result from the second statement is now the problem. Depending on the time, the number of rows that are returned is for example around 500K lower than the result from the first statement. The difference of the two results is decreasing with time.
So I have to wait 15 to 30 minutes before both statements return the same number of rows.
I already talked with the oracle dba about this issue but he has no idea how this could happen.
Any ideas, questions or suggestions?
Update
When I select only an index column I get the correct row count.
When I instead select an non index column I get again the wrong row count.
That doesn't sounds like a bug to me, if I understood you correctly, it just takes time for Oracle to fetch the entire table . After all, 3 Mil is not a small amount.
As opposed to count, which brings 1 record with the total number of rows.
If after some waiting, the number of records being output equals to the number that the count query returns, then everything is fine.
Have you already verified with these things:
1- Count single column instead of * ALL to verify both result
2- You can verify both queries result by adding where clause and gradually select more rows by removing conditions so that you can get the issue where it is returning different value from both.
I think you should check Execution plan to identify missing indexes to improve performance.
Add missing Indexes and check the result.
Why missing Indexes are impotent:
To count row, Oracle engine no need to go throw paging operation. But while fetching all the details from a table, it requires to go through paging.
And paging process depends on indexes created on a table to fetch the data effectively and fast.
So to decrease time for your second statement, you should find missing indexes and create those indexes.
How to Find Missing Indexes:
You can start with DBA_HIST_ACTIVE_SESS_HISTORY, and look at all statements that contain that type of hint.
From there, you can pull the index name coming from that hint, and then do a lookup on dba_indexes to see if the index exists, is valid, etc.
I have a table
Books(BookId, Name, ...... , PublishedYear)
I do have about 30 fields in my Books table, where BookId is the primary key (Identity column). I have about 2 million records for this table.
I know select * is evil performance killer..
I have a situation to select range of rows or all the rows having all the columns in it.
Select * from Books;
this query takes more than 2 seconds to scan through the data page and get all the records. On checking the execution it still uses the Clustered index scan.
Obviously 2 seconds my not be that bad, however when this table has to be joined with other tables which is executed in batch is taking time over 15 minutes (There are no duplicate records though on the final result at completion as the count is matching). The join criteria is pretty simple and yields no duplication.
Excluding this table alone has the batch execution completed in sub seconds.
Is there a way to optimize this having said that I will have to select all the columns :(
Thanks in advance.
I've just run a batch against my developer instance, one SELECT specifying all Columns and one using *. There is no evidence (nor should there) that there is any difference aside from the raw parsing of my input. If I remember correctly, that old saying really means: Do not SELECT columns you are not using, they use up resources without benefit.
When you try to improve performance in your code, always check your assumptions, they might only apply to some older version (of sql server etc) or other method.
I have created table as bellow
create table T1(num varchar2(20))
then I inserted 3 lac numbers in above table so now it looks like below
num
1
2
3
.
.
300000
Now if I do
select * from T1
then it takes 1min 15sec to completely fetch the records and as I created index on column num and if I use below query then it should be faster to fetch 3 lac records but it takes also 1min15sec for fetch the records
select * from T1 where num between '1' and '300000'
So how the index has improved my retrieval process?
The index does not improve the retrieval process when you are trying to fetch all rows.
The index makes it possible to find a subset of rows much more quickly.
An index can help if you want to retrieve a few rows from a large table. But since you retrieve all rows and since your index contains all the columns of your table, it won't speed up the query.
Furthermore, you don't tell us what tool you use to retrieve the data. I guess you use SQL Developer or Toad. So what you measure is the time it takes SQL Developer or Toad to store 300,000 rows in memory in such a way that they can be easily displayed on screen in a scrollable table. You aren't really measuring how long it takes to retrieve them.
To get a test of the effects of having an index in place you might want to try a query such as
SELECT *
FROM T1
WHERE NUM IN ('288888', '188888', '88888')
both with with the index in place, and again after removing the index. You should also collect statistics on the table prior to running the query with the index in place or you may still get a query which performs a full table scan. Share and enjoy.
Let's say we have
SELECT * FROM A INNER JOIN B ON [....]
Assuming A has 2 rows and B contains 1M rows including 2 rows linked to A:
B will be scanned only once with "actual # of rows" of 2 right?
If I add a WHERE on table B:
SELECT * FROM A INNER JOIN B ON [....] WHERE B.Xyz > 10
The WHERE will actually be executed before the join... So if the where
returns 1000 rows, the "actual # of rows" of B will be 1000...
I don't get it.. shouldn't it be <= 2???
What am I missing... why does the optimiser proceeds that way?
(SQL 2008)
Thanks
The optimizer will proceed whichever way it thinks is faster. That means if the Xyz column is indexed but the join column is not, it will likely do the xyz filter first. Or if your statistics are bad so it doesn't know that the join filter would pare B down to just two rows, it would do the WHERE clause first.
It's based entirely on what indexes are available for the optimizer to use. Also, there is no reason to believe that the db engine will execute the WHERE before another part of the query. The query optimizer is free to execute the query in any order it likes as long as the correct results are returned. Again, the way to properly optimize this type of query is with strategically placed indexes.
The "scanned only once" is a bit misleading. A table scan is a horrendously expensive thing in SQL Server. At least up to SS2005, a table scan requires a read of all rows into a temporary table, then a read of the temporary table to find rows matching the join condition. So in the worst case, your query will read and write 1M rows, then try to match 2 rows to 1M rows, then delete the temporary table (that last bit is probably the cheapest part of the query). So if there are no usable indexes on B, you're just in a bad place.
In your second example, if B.Xyz is not indexed, the full table scan happens and there's a secondary match from 2 rows to 1000 rows - even less efficient. If B.Xyz is indexed, there should be an index lookup and a 2:1000 match - much faster & more efficient.
'course, this assumes the table stats are relatively current and no options are in effect that change how the optimizer works.
EDIT: is it possible for you to "unroll" the A rows and use them as a static condition in a no-JOIN query on B? We've used this in a couple of places in our application where we're joining small tables (<100 rows) to large (> 100M rows) ones to great effect.