I am doing performance tests on a table and for that I insert several millions of rows with fake data and perform the query.
Initially the response time is severely degraded, but I retested several hours later, and the response time improved significantly.
Is Oracle busy with some activities just after my insertion but these activities are finished after some time? I need an explanation for this behavior.
Thanks!
When you first run a query, the SQL Engine has to do everything including, but not limited to:
Parse the query;
Generate a plan for the query;
Perform IO to load the blocks of data from the datafile;
Transform the data;
Store the result set in the result cache;
Return the results to the client.
When you run a query for a second time, the SQL engine can:
Check whether the query has been run before and if the table statistics is unchanged and, if so, load the previous plan;
Check if the underlying data is unchanged and if the result of the previous query is in the result cache and, if so, return the result directly from the result cache;
If something has changed, can check if the blocks are still in its local cache and if they are not stale then it does not have to perform IO from the datafiles.
Therefore, on a second execution of a query there are lots of optimisations that can be made that short cut through some of the expensive operations that must be made on the first run of a query.
A simulated example of the performance optimisations that can be made using the results cache is in this db<>fiddle.
Related
I am trying to optimise some DBT models by using incremental runs on partitioned data and ran into a problem - the suggested approach that I've found doesn't seem to work. By not working I mean that it doesn't decrease the processing load as I'd expect.
Below is the processing load of a simple select of the partitioned table:
unfiltered query processing load
Now I try to select only new rows added since the last incremental run:
filtered query
You can notice, that the load is exactly the same.
However, the select inside the WHERE is rather lightweight:
selecting only the max date
And when I fill in the result of that select manually, the processing load is suddenly minimal, what I'd expect:
expected processing load
Finally, both tables (the one I am querying data from, and the one I am querying max(event_time)) are configured in exactly the same way, both being partitioned by DAY on field event_time:
config on tables
What am I doing wrong? How could I improve my code to actually make it work? I'd expect the processing load to be similar to the one using an explicit condition.
P.S. apologies for posting links instead of images. My reputation is too low, as this is my first question here.
Since the nature of query is dynamic, i.e. the where condition is not absolute(constant), BigQuery cannot estimate the accurate processed data before execution.
This is due the fact that max(event_time) is not constant and might change, hence affecting the size of the data to be fetched by the outer query.
For estimation purposes, try one of these 2 approaches:
Replace the inner query by a constant value and check the estimated bytes to be processed.
Try running the query once and check the processed data under Query results -> Job Information ->Bytes processed and Bytes billed
Oracle by default caching the query results and function results.
I have noticed this with AutoTrace utility, Where the physical reads are huge on first execution, but from next execution onwards it reduced dramatically.
Than what is the importance of query result cache, function result Cache?
Could some one help on this to understand better.
It's very simple: when you have query result cache the query will most likely not executed once again - the result will be provided from that cache. In case of absence of that functionality Oracle will do query on cached data (buffer cache) that is more expensive. Query result cache might be implemented on client side as well that may eliminate even round trip to server.
I have to benchmark a query - currently I need to know how adding parameter to select result set(FIELD_DATE1) will affect sql execution time. There is administration restrictions in db so I can not use debug. So I wrote a query:
SELECT COUNT(*), MIN(XXXT), MAX(XXXT)
FROM ( select distinct ID AS XXXID, sys_extract_utc(systimestamp) AS XXXT
, FIELD_DATE1 AS XXXUT
from XXXTABLE
where FIELD_DATE1 > '20-AUG-06 02.23.40.010000000 PM' );
Will output of query show real times of query execution
There is a lot to learn when it comes to benchmarking in Oracle. I recommend you to begin with the items below even though It worries me that you might not have enough restrictions in db since some of these features could require extra permissions:
Explain Plan: For every SQL statement, oracle has to create an execution plan, the execution plan defines how to information will be read/written. I.e.: the indexes to use, the join method, the sorts, etc.
The Explain plan will give you information about how good your query is and how it is using the indexes. Learning the concept of a query cost for this is key, so take a look to it.
TKPROF: it's an Oracle tool that allows you to read oracle trace files. When you enable timed statistics in oracle you can trace your sql statements, the result of this traces are put in files; You can read these files with TKPROF.
Among the information TKPROF will let you see is:
count = number of times OCI procedure was executed
cpu = cpu time in seconds executing
elapsed = elapsed time in seconds executing
disk = number of physical reads of buffers from disk
query = number of buffers gotten for consistent read
current = number of buffers gotten in current mode (usually for update)
rows = number of rows processed by the fetch or execute call
See: Using SQL Trace and TKPROF
It's possible in this query that SYSTIMESTAMP would be evaluated once, and the same value associated with every row, or that it would be evaluated once for each row, or something in-between. It also possible that all the rows would be fetched from table, then SYSTIMESTAMP evaluated for each one, so you wouldn't be getting an accurate account of the time taken by the whole query. Generally, you can't rely on order of evaluation within SQL, or assume that a function will be evaluated once for each row where it appears.
Generally the way I would measure execution time would be to use the client tool to report on it. If you're executing the query in SQLPlus, you can SET TIMING ON to have it report the execution time for every statement. Other interactive tools probably have similar features.
I need to give away results of the query in response to a following URL:
http://foo.com/search?from=20&perpage=10
What is the best practice from performance point of view?
Run the query every time with WHERE row > N and row <= K
Run query once and keep all the result for certain amount of time
What else could be done?
If your query will produce a very small amount of data you can consider to cache the results. But think about this:
High performance pagination isn't trivial. It can be done database-side with better performance (if you're using Microsoft SQL Server 2012 it has its own extensions for this).
Pagination is done by users and they need time to read data and click the "Next page" button. Performance hit of a new query won't be big and the database will handle its own cache better than what we can do.
Cache a large amount of data is useless, memory consuming and less efficient than a new query.
Then my answer is run the query every time. Your database will handle that better than you and it'll scale with your data (otherwise what may work for 500 record will be broken when your system will have 1000000 records).
To summarize: database engines are more efficient than us to cache data and they scale better, (usually) a good index is all they need to perform well.
When I run a certain stored procedure for the first time it takes about 2 minutes to finish. When I run it for the second time it finished in about 15 seconds. I'm assuming that this is because everything is cached after the first run. Is it possible for me to "warm the cache" before I run this procedure for the first time? Is the cached information only used when I call the same stored procedure with the same parameters again or will it be used if I call the same stored procedure with different params?
When you peform your query, the data is read into memory in blocks. These blocks remain in memory but they get "aged". This means the blocks are tagged with the last access and when Sql Server requires another block for a new query and the memory cache is full, the least recently used block (the oldest) is kicked out of memory. (In most cases - full tables scan blocks are instantly aged to prevent full table scans overrunning memory and choking the server).
What is happening here is that the data blocks in memory from the first query haven't been kicked out of memory yet so can be used for your second query, meaning disk access is avoided and performance is improved.
So what your question is really asking is "can I get the data blocks I need into memory without reading them into memory (actually doing a query)?". The answer is no, unless you want to cache the entire tables and have them reside in memory permanently which, from the query time (and thus data size) you are describing, probably isn't a good idea.
Your best bet for performance improvement is looking at your query execution plans and seeing whether changing your indexes might give a better result. There are two major areas that can improve performance here:
creating an index where the query could use one to avoid inefficient queries and full table scans
adding more columns to an index to avoid a second disk read. For example, you have a query that returns columns A, and B with a where clause on A and C and you have an index on column A. Your query will use the index for column A requiring one disk read but then require a second disk hit to get columns B and C. If the index had all columns A, B and C in it the second disk hit to get the data can be avoided.
I don't think that generating the execution plan will cost more that 1 second.
I believe that the difference between first and second run is caused by caching the data in memory.
The data in the cache can be reused by any further query (stored procedure or simple select).
You can 'warm' the cache by reading the data through any select that reads the same data. But that will even cost about 90 seconds as well.
You can check the execution plan to find out which tables and indexes your query uses. You can then execute some SQL to get the data into the cache, depending on what you see.
If you see a clustered index seek, you can simply do SELECT * FROM my_big_table to force all the table's data pages into the cache.
If you see a non-clustered index seek, you could try SELECT first_column_in_index FROM my_big_table.
To force a load of a specific index, you can also use the WITH(INDEX(index)) table hint in your cache warmup queries.
SQL server cache data read from disc.
Consecutive reads will do less IO.
This is of great help since disk IO is usually the bottleneck.
More at:
http://blog.sqlauthority.com/2014/03/18/sql-server-performance-do-it-yourself-caching-with-memcached-vs-automated-caching-with-safepeak/
The execution plan (the cached info for your procedure) is reused every time, even with different parameters. It is one of the benefits of using stored procs.
The very first time a stored procedure is executed, SQL Server generates an execution plan and puts it in the procedure cache.
Certain changes to the database can trigger an automatic update of the execution plan (and you can also explicitly demand a recompile).
Execution plans are dropped from the procedure cache based an their "age". (from MSDN: Objects infrequently referenced are soon eligible for deallocation, but are not actually deallocated unless memory is required for other objects.)
I don't think there is any way to "warm the cache", except to perform the stored proc once. This will guarantee that there is an execution plan in the cache and any subsequent calls will reuse it.
more detailed information is available in the MSDN documentation: http://msdn.microsoft.com/en-us/library/ms181055(SQL.90).aspx