Oracle 10 optimizer from RULE to COST: why? - sql

Oracle decided to dismiss the rule-based optimizer from version 10g, leaving the cost-based one as the only choice.
I think that a rule-based optimizer has the unvaluable positive side of being always predictable. I've seen instead Oracle 10g changing execution plans from night to day, leading to turtle-like performances.
Which could be the rationale behind this change?

Because everything you can do with RBO, can be done with CBO.
The CBO can be rule-based too — more than that, you may decide the "rules" yourself.
To create your own "rules", you hint your query or do a CREATE OUTLINE which will hint it for you. As a result, your execution plan is stable.
The outlines are stored in a system schema called OUTLN, they are editable.
As for me, I always supply hints to my queries running in a production database.

The RBO is often predicatably bad as well as predictably good. It also doesn't support partitioning and some other database features. The CBO is much better, and as Quassnoi says plan stability is a feature of the CBO also.

The RBO has been deprecated for a long time; it was really just retained for backwards compatibility with legacy applications. Oracle have been announcing the demise of the RBO since (IIRC) version 8, which came out about 10 years ago.
The RBO was deterministic, but not all that clever. Oracle was originally designed before cost-based optimisers were even available, let alone a mature technology. The RBO has been frozen for a long time and does not support a lot of features of modern Oracle engines.
Cost-based optimisation is much smarter. However, if you had queries optimised for the RBO, they might not play nicely with the CBO. You will probably have to re-write or hint your queries appropriately to tune them for the CBO. There is also a facility to specify a query plan and override the CBO with that plan. This will give you deterministic query execution with stable plans.

(I am not a DBA.)
My understanding is that Oracle has been moving away from the RBO for a long time in favor of CBO. It seems useful to me to stop supporting a feature that is no longer in active development (given a long enough depreciation period) so that everyone is using the most effective features.
It's interesting that you called predictability an "unvaluable" effect of using the rule-based optimizer. It seems like when the data changes to make an execution plan sub-optimal it would be best to switch to a new one. Only in the case you alluded to where the optimizer flip-flops between two execution plan would there be a problem with picking the best plan for the the data you are actually querying. I'm not sure what advantage predictability is in more normal situation.
Ending support of the out-dated optimizer ought to free up support for the newer optimizer.

The reason they moved to cost-based optimization is that it can perform better since its based on analyzing statistical information that the rule-based optimizer does not have.
To make the CBO work better, its important to understand the role that statistics gathering plays in execution plan changes which directly affect performance. For one thing, running statistics more or less frequently could help you. Here is a good article about the CBO and statistics:
Optimizing Oracle Optimizer Statistics

I think you should do rule based programming. Don't think about the situation, follow a list of inviolate rules, no matter what the situation, no matter what you think is the better way, if the rules say use a FOR LOOP in case X then you have to use a loop, even if you know if there will only be 1, loop from 1 to 1.
Stipulate:
Every query has a best plan.
Every query optimizer will determine that plan x% of the time.
The RBO had nowhere else to go, it's percent accuracy is lower than the CBO to be sure, but it was never going to get any better. It was limited like any rule based system.

Related

Does Oracle12c optimizer adaptive features eliminate need for indexes?

My DBA thinks the new Oracle12c optimizer features means he doesn't need to add indexes on important columns anymore. I am having a hard time believing that can possibly be true. It only seems to improve how joins are performed based on historical statistics.
You are right, he has totally misunderstood. You can read about Adaptive Query Optimization in this Oracle white paper. It means that the optimizer can change it query execution plan while it is running. But if you don't index columns that could benefit from them, the optimizer will not be able to (adaptively or otherwise) choose a plan that would benefit from those indexes!

Optimizing SPARQL queries in MarkLogic

In many SPARQL systems, you can optimize your queries by re-ordering the triples in a graph pattern. In others, you cannot (the query engine optimizes them using some heuristics of its own)
In Jena, you can optimize queries by putting triple patterns that most restrict the result set early in the query (and making sure always to mention some variable that is already bound). In other query engines, this strategy doesn't work (since they process the query differently)
Apparently, Marklogic SPARQL is sensitive to the ordering of the triple patterns (I have re-arranged some queries and found that they will go faster or slower, by as much as 10x). But I can't find any rhyme or reason to which sequences go faster or slower. The heuristics I have used successfully with Jena don't work with MarkLogic.
I have googled about and looked at the MarkLogic docs, and haven't found any advice about this. Have any of the MarkLogic query engine writers made any notes about this?
The optimization level gives a hint to the optimizer to tell it how much time to spend on finding the best query plan.
Any optimizer spends some time to find the best plan, but has to balance the time spent finding that plan against the goodness of using the best possible plan. Most of us have experienced this trade-off in real life (!)
Optimization level 1 says "do some work looking for the best plan, but don't go crazy". Level 2 says "do more work to find the best plan". Level 0 says "just take the query as it is".
For most queries level 1 is appropriate, and that's the default. If you have a particularly complex query, try it with level 2 and see if the extra time spent finding a plan to make the query faster actually pays off in overall query time. If you have very simple queries, try level 0.
I asked our experts on the PM and Engineering teams inside MarkLogic. I was told, "you should get the same performance no matter what order you have for patterns, since we have a sophisticated query optimizer. If that's not true, please file a bug." It would also be useful to know what specific version of MarkLogic you are using, what optimization level, and some sample queries.

SQL Optimization Strategy

I'm a student and I'm doing my database assignment.
I want to use indexing and query optimization for my database optimization strategy.
The problem is how can I prove my strategy make a improvement? my lecture said about query optimization that I can prove by calculation, anyone got more ideas? what to calculate?
what about indexing, I need evidence to prove it. how??
In terms of evidence of optimization, you have to have instrumented code for your test cases (e.g. you can take timings accurately) and re-runnable test cases. The ideal situation for a re-runable set of test cases is to also be able to reset to a baseline database so you can guarentee the starting conditions of the data is the same per test run.
You also need to understand for each test case other more subtle factors:
Are you running against a cold procedure cache / warm procedure cache.
Are you running against a cold data cache / warm data cache.
For larger datasets, are you using the exact same table, e.g. no page splits have occured since.
I would think a before and after explain plan would go a long way towards proving an improvement.
See SQL Server Performance HERE.
Which DBMS are you using?
I suggest you take a look at what tracing options your DBMS product provides. For example, in Oracle you can use SQL Trace and parse the output using tkprof to provide you with the figures you'll need to prove that your database optimization strategy shows an improvement.

What are "SQL-Hints"?

I am an advocate of ORM-solutions and from time to time I am giving a workshop about Hibernate.
When talking about framework-generated SQL, people usually start talking about how they need to be able to use "hints", and this is supposedly not possible with ORM frameworks.
Usually something like: "We tried Hibernate. It looked promising in the beginning, but when we let it loose on our very very complex production database it broke down because we were not able to apply hints!".
But when asked for a concrete example, the memory of those people is suddenly not so clear any more ...
I usually feel intimidated, because the whole "hints"-topic sounds like voodoo to me...
So can anybody enlighten me? What is meant by SQL-hints or DB-Hints?
The only thing I know, that is somehow "hint-like" is SELECT ... FOR UPDATE. But this is supported by the Hibernate-API...
A SQL statement, especially a complex one, can actually be executed by the DB engine in any number of different ways (which table in the join to read first, which index to use based on many different parameters, etc).
An experienced dba can use hints to encourage the DB engine to choose a particular method when it generates its execution plan. You would only normally need to do this after extensive testing and analysis of the specific queries (because the DB engines are usually pretty darn good at figuring out the optimum execution plan).
Some MSSQL-specific discussion and syntax here:
http://msdn.microsoft.com/en-us/library/ms181714.aspx
Edit: some additional examples at http://geeks.netindonesia.net/blogs/kasim.wirama/archive/2007/12/31/sql-server-2005-query-hints.aspx
Query hints are used to guide the query optimiser when it doesn't produce sensible query plans by default. First, a small background in query optimisers:
Database programming is different from pretty much all other software development because it has a mechanical component. Disk seeks and rotational latency (waiting fora particular sector to arrive under the disk head) are very expensive in comparison to CPU. Different query resolution strategies will result in different amounts of I/O, often radically different amounts. Getting this right or wrong can make a major difference to the performance of the query. For an overview of query optimisation, see This paper.
SQL is declarative - you specify the logic of the query and let the DBMS figure out how to resolve it. A modern cost-based query optimiser (some systems, such as Oracle also have a legacy query optimiser retained for backward compatibility) will run a series of transformations on the query. These maintain semantic equivalence but differ in the order and choice of operations. Based on statistics collected on the tables (sizes, distribution histograms of keys) the optimiser computes an estimate of the amount of work needed for each query plan. It selects the most efficient plan.
Cost-based optimisation is heuristic, and is dependent on accurate statistics. As query complexity goes up the heuristics can produce incorrect plans, which can potentially be wildly inefficient.
Query hints can be used in this situation to force certain strategies in the query plan, such as a type of join. For example, on a query that usually returns very small result sets you may wish to force a nested loops join. You may also wish to force a certain join order of tables.
O/R mappers (or any tool that generates SQL) generates its own query, which will typically not have hinting information. In the case that this query runs inefficiently you have limited options, some of which are:
Examine the indexing on the tables. Possibly you can add an index. Some systems (recent versions of Oracle for example) allow you index joins across more than one table.
Some database management systems (again, Oracle comes to mind) allow you to manually associate a query plan with a specific query string. Query plans are cached by a hash value of the query. If the queries are paramaterised the base query string is constant and will resolve to the same hash value.
As a last resort, you can modify the database schema, but this is only possible if you control the application.
If you control the SQL you can hint queries. In practice it's fairly uncommon to actually need to do this. A more common failure mode on O/R mappers with complex database schemas is they can make it difficult to express complex query predicates or do complex operations over large bodies of data.
I tend to advocate using the O/R mapper for the 98% of work that it's suited for and dropping to stored procedures where they are the appropriate solution. If you really need to hint a query than this might be the appropriate strategy. Unless there is something unusual about your application (for example some sort of DSS) you should
only need to escape from the O/R mapper on a minority of situations. You might also
find (again, an example would be DSS tools working with the data in aggregate) that an O/R mapper is not really the appropriate strategy for the application.
While HINTS do as the other answers describe, you should only use them in rare, researched circumstances. 9 times out of 10 a HINT will result in a poor query plan. Unless you really know what you are doing, don't use them.
There is no such thing as "optimized SQL code", because SQL code is never executed.
SQL code is translated into an execution plan by the Optimizer. The Optimizer will use the information it has to choose (among other things).
the order in which tables are involved
the join method for each involved table (nested/merge/hash)
how to access a table's data (direct table access/ index with bookmark lookup/direct index access) (scan/seek)
should parallelism be used, and when to end parallelism (gather streams)
Query hints allow a programmer to over-ride (in most cases) or suggest politely (in other cases) the optimizer's choices.
Query hints can let you force off parallelism, force all joins to be implemented as nested loop, force one index to be used over another... as a few examples.
Since the optimizer is really good, if one over-rides the optimizer, one is generally asking for a non-optimal plan. Query hints are best served when the optimizer does not have the required information to make a good choice.
One place I use query hints is for table variables. Table variables are assumed to have 0 rows by the Optimizer, and so the Optimizer always joins table variables using nested loop (the best join implementation for small numbers of rows). If I have a large table variable - already ordered in a favorable way for merge join, I can specify a merge join be used by applying a query hint.
All modern RDBMS-es have some sort of query optimizer that calculates best query plan, which is sequence of read/write operations needed to execute SQL query.
Sometimes plans can be suboptimal, so RDBMS designers included "hints" in SQL. Hints are instructions you can embed in your SQL that affect query optimizer, With hints you can instruct query optimizer e.g. which indexes it should use, in what order data should be read from tables, ...
So, with hints you can resolve some bottlenecks that the query optimizer cannot solve by itself.
For example, here is list of Oracle hints.

What is the purpose for using OPTION(MAXDOP 1) in SQL Server?

I have never clearly understood the usage of MAXDOP. I do know that it makes the query faster and that it is the last item that I can use for Query Optimization.
However, my question is, when and where it is best suited to use in a query?
As Kaboing mentioned, MAXDOP(n) actually controls the number of CPU cores that are being used in the query processor.
On a completely idle system, SQL Server will attempt to pull the tables into memory as quickly as possible and join between them in memory. It could be that, in your case, it's best to do this with a single CPU. This might have the same effect as using OPTION (FORCE ORDER) which forces the query optimizer to use the order of joins that you have specified. IN some cases, I have seen OPTION (FORCE PLAN) reduce a query from 26 seconds to 1 second of execution time.
Books Online goes on to say that possible values for MAXDOP are:
0 - Uses the actual number of available CPUs depending on the current system workload. This is the default value and recommended setting.
1 - Suppresses parallel plan generation. The operation will be executed serially.
2-64 - Limits the number of processors to the specified value. Fewer processors may be used depending on the current workload. If a value larger than the number of available CPUs is specified, the actual number of available CPUs is used.
I'm not sure what the best usage of MAXDOP is, however I would take a guess and say that if you have a table with 8 partitions on it, you would want to specify MAXDOP(8) due to I/O limitations, but I could be wrong.
Here are a few quick links I found about MAXDOP:
Books Online: Degree of Parallelism
General guidelines to use to configure the MAXDOP option
This is a general rambling on Parallelism in SQL Server, it might not answer your question directly.
From Books Online, on MAXDOP:
Sets the maximum number of processors
the query processor can use to execute
a single index statement. Fewer
processors may be used depending on
the current system workload.
See Rickie Lee's blog on parallelism and CXPACKET wait type. It's quite interesting.
Generally, in an OLTP database, my opinion is that if a query is so costly it needs to be executed on several processors, the query needs to be re-written into something more efficient.
Why you get better results adding MAXDOP(1)? Hard to tell without the actual execution plans, but it might be so simple as that the execution plan is totally different that without the OPTION, for instance using a different index (or more likely) JOINing differently, using MERGE or HASH joins.
As something of an aside, MAXDOP can apparently be used as a workaround to a potentially nasty bug:
Returned identity values not always correct
There are a couple of parallization bugs in SQL server with abnormal input. OPTION(MAXDOP 1) will sidestep them.
EDIT: Old. My testing was done largely on SQL 2005. Most of these seem to not exist anymore, but every once in awhile we question the assumption when SQL 2014 does something dumb and we go back to the old way and it works. We never managed to demonstrate that it wasn't just a bad plan generation on more recent cases though since SQL server can be relied on to get the old way right in newer versions. Since all cases were IO bound queries MAXDOP 1 doesn't hurt.
Adding my two cents, based on a performance issue I observed.
If simple queries are getting parellelized unnecessarily, it can bring more problems than solving one. However, before adding MAXDOP into the query as "knee-jerk" fix, there are some server settings to check.
In Jeremiah Peschka - Five SQL Server Settings to Change, MAXDOP and "COST THRESHOLD FOR PARALLELISM" (CTFP) are mentioned as important settings to check.
Note: Paul White mentioned max server memory aslo as a setting to check, in a response to Performance problem after migration from SQL Server 2005 to 2012. A good kb article to read is Using large amounts of memory can result in an inefficient plan in SQL Server
Jonathan Kehayias - Tuning ‘cost threshold for parallelism’ from the Plan Cache helps to find out good value for CTFP.
Why is cost threshold for parallelism ignored?
Aaron Bertrand - Six reasons you should be nervous about parallelism has a discussion about some scenario where MAXDOP is the solution.
Parallelism-Inhibiting Components are mentioned in Paul White - Forcing a Parallel Query Execution Plan