Different result size between SELECT * and SELECT COUNT(*) on Oracle - sql

I have an strange behavior on an oracle database. We make a huge insert of around 3.1 million records. Everything fine so far.
Shortly after the insert finished (around 1 too 10 minutes) I execute two statements.
SELECT COUNT(*) FROM TABLE
SELECT * FROM TABLE
The result from the first statement is fine it gives me the exact number of rows that was inserted.
The result from the second statement is now the problem. Depending on the time, the number of rows that are returned is for example around 500K lower than the result from the first statement. The difference of the two results is decreasing with time.
So I have to wait 15 to 30 minutes before both statements return the same number of rows.
I already talked with the oracle dba about this issue but he has no idea how this could happen.
Any ideas, questions or suggestions?
Update
When I select only an index column I get the correct row count.
When I instead select an non index column I get again the wrong row count.

That doesn't sounds like a bug to me, if I understood you correctly, it just takes time for Oracle to fetch the entire table . After all, 3 Mil is not a small amount.
As opposed to count, which brings 1 record with the total number of rows.
If after some waiting, the number of records being output equals to the number that the count query returns, then everything is fine.

Have you already verified with these things:
1- Count single column instead of * ALL to verify both result
2- You can verify both queries result by adding where clause and gradually select more rows by removing conditions so that you can get the issue where it is returning different value from both.

I think you should check Execution plan to identify missing indexes to improve performance.
Add missing Indexes and check the result.
Why missing Indexes are impotent:
To count row, Oracle engine no need to go throw paging operation. But while fetching all the details from a table, it requires to go through paging.
And paging process depends on indexes created on a table to fetch the data effectively and fast.
So to decrease time for your second statement, you should find missing indexes and create those indexes.
How to Find Missing Indexes:
You can start with DBA_HIST_ACTIVE_SESS_HISTORY, and look at all statements that contain that type of hint.
From there, you can pull the index name coming from that hint, and then do a lookup on dba_indexes to see if the index exists, is valid, etc.

Related

using SQL COUNT function or executing search query directly which is more efficient

Let's say i have a very big database , if i execute a search query directly then count the returned rows would it be more faster ? Or using COUNT(searchquery) then start executing query like ->
SELECT *
FROM TABLE
WHERE bla='blabla'
OFFSET 0 FETCH NEXT 20 ROWS ONLY
I searched for it but i couldn't find any solution.
Do the count in the database! It will be much faster.
First, a count(*) only returns one row and one value. That is much, much less data -- and much faster -- than returning all the rows.
Second, a count(*) does not reference any columns in the select, so the query can be better optimized. It might be possible to get the count without ever looking at the data pages.
It looks like you are doing paging. You need the total count to do display the total count and calculate the total number of pages to the user, yes?
Than Gordon's answer is the one to use.

how to speed up a clustered index scan while selecting all fields on range of rows or all the rows

I have a table
Books(BookId, Name, ...... , PublishedYear)
I do have about 30 fields in my Books table, where BookId is the primary key (Identity column). I have about 2 million records for this table.
I know select * is evil performance killer..
I have a situation to select range of rows or all the rows having all the columns in it.
Select * from Books;
this query takes more than 2 seconds to scan through the data page and get all the records. On checking the execution it still uses the Clustered index scan.
Obviously 2 seconds my not be that bad, however when this table has to be joined with other tables which is executed in batch is taking time over 15 minutes (There are no duplicate records though on the final result at completion as the count is matching). The join criteria is pretty simple and yields no duplication.
Excluding this table alone has the batch execution completed in sub seconds.
Is there a way to optimize this having said that I will have to select all the columns :(
Thanks in advance.
I've just run a batch against my developer instance, one SELECT specifying all Columns and one using *. There is no evidence (nor should there) that there is any difference aside from the raw parsing of my input. If I remember correctly, that old saying really means: Do not SELECT columns you are not using, they use up resources without benefit.
When you try to improve performance in your code, always check your assumptions, they might only apply to some older version (of sql server etc) or other method.

Is it possible to query a table without order columns by page

I've a big table which contains more than 100K records, in oracle. I want to get all of the records and save each row to a file with JDBC.
In order to make it faster, I want to create 100 threads to read the data from the table concurrently. I will get the total count of the records in the first sql, then split it to 100 pages, then get one page in a thread with a new connection.
But I've a problem, that there is no any column can be used to order. There is no column with sequence, no accurate timestamp. I can't use a sql query without order by clause to query, since there is no guarantee it will return the data with the same order every time (per this question).
So is it possible to solve it?
Finally, I used rowid to order:
select * from mytable order by rowid
It seems work well.

oracle 10 performance issue with Select * from

sql : select * from user_info where userid='1100907' and status='1'
userid is indexed, the table has less than 10000 rows and has a LOB column.
The sql takes 1 second (I got this by using "set timing on" in sqlplus). I tried to use all columns names to replace *,but still 1 second. After I removed the LOB column ,the sql takes 0.99 secs . When I reduced the number of columns by half, the time goes to halved too.
Finally, select userid from user_info where userid='1100907' and status='1' takes 0.01 seconds.
Can someone figure it out ?
Bear in mind that "wall clock performance testing" is unreliable. It is subject to ambient database conditions, and - when outputting to SQL*Plus - dependent on how long it takes to physically display the data. That might explain why selecting half the columns really has such a substantial impact on elapsed time. How many columns does this table have?
Tuning starts with EXPLAIN PLAN. This tool will show you how the database will execute your query. Find out more.
For instance, it is quicker to service this query
select userid from user_info
then this one
select * from user_info
because the database can satisfy the first query with information from the index on userid, without touching the table at all.
edit
"Can you tell me why sqlplus print
column names many many times other
than just returning result"
This is related to paging. SQLPlus repeats the column headers every time it throws a page. You can suppress this behaviour with either of these SQLPlus commands:
set heading off
or
set pages n
In that second case, make n very big (e.g. 2000) or zero.
Perhaps you have 100 columns in the user_info table? If so, how many of those columns do you actually need in the query?

What's the most efficient way to check the presence of a row in a table?

Say I want to check if a record in a MySQL table exists. I'd run a query, check the number of rows returned. If 0 rows do this, otherwise do that.
SELECT * FROM table WHERE id=5
SELECT id FROM table WHERE id=5
Is there any difference at all between these two queries? Is effort spent in returning every column, or is effort spent in filtering out the columns we don't care about?
SELECT COUNT(*) FROM table WHERE id=5
Is a whole new question. Would the server grab all the values and then count the values (harder than usual), or would it not bother grabbing anything and just increment a variable each time it finds a match (easier than usual)?
I think I'm making a lot of false assumptions about how MySQL works, but that's the meat of the question! Where am I wrong? Educate me, Stack Overflow!
Optimizers are pretty smart (generally). They typically only grab what they need so I'd go with:
SELECT COUNT(1) FROM mytable WHERE id = 5
The most explicit way would be
SELECT WHEN EXISTS (SELECT 1 FROM table WHERE id = 5) THEN 1 ELSE 0 END
If there is an index on (or starting with) id, it will only search, with maximum efficiency, for the first entry in the index it can find with that value. It won't read the record.
If you SELECT COUNT(*) (or COUNT anything else) it will, under the same circumstances, count the index entries, but not read the records.
If you SELECT *, it will read all the records.
Limit your results to at most one row by appending LIMIT 1, if all you want to do is check the presence of a record.
SELECT id FROM table WHERE id=5 LIMIT 1
This will definitely ensure that no more than one row is returned or processed. In my experience, LIMIT 1 (or TOP 1 depending in the DB) to check for existence of a row makes a big difference in terms of performance for large tables.
EDIT: I think I misread your question, but I'll leave my answer here anyway if it's of any help.
I would think this
SELECT null FROM table WHERE id = 5 LIMIT 1;
would be faster than this
SELECT 1 FROM table WHERE id = 5 LIMIT 1;
but the timer says the winner is "SELECT 1".
For the first two queries, most people will generally say, always specify exactly what you need and leave the rest. Effort isn't all specific as bandwidth could be spent in returning data that you aren't even going to do anything with.
As for the previous answer will do for your result set, unless you're dealing with a language that supports affected rows. This can sometimes work when getting data to collect information on how many rows were returned in the last query. You'll need to look at your interface documentation as to how to get that information.
The difference between your 3 queries depends on how you've built your index. Only returning the primary key is likely to be faster as MySQL will have your index in memory, and not have to hit disk. Adding the LIMIT 1 is also a good trick that will speed up the optimizer significantly in early 5.0.x branches and earlier.
try EXPLAIN SELECT id FROM table WHERE id=5 and check the Extras column for the presence of USING INDEX. If its there, then you're query is coming straight from the index, and is going to be much faster.