When to use Oracle hints? - sql

I'm doing some refactoring on a Oracle Schema (oracle version 10), and I see a lot of views that use hints *+ALL_ROWS*/. In others views there are also other type of hints.
Why I should use a hint? Doesn't the DB make the best choice in base of the query? many thanks!

That's a good question, but there's no single answer to it because there are different categories of hint for which different advice would apply. http://docs.oracle.com/cd/E11882_01/server.112/e16638/hintsref.htm#PFGRF501
ALL_ROWS is an optimisation approach, and it's perfectly valid to specify it in order to make it clear that your goal is to get the last row of the result set as early as possible, not the first. In many cases the optimiser will deduce that from the query anyway, so it may be redundant, but you're not going to harm anything by using it correctly.
Then there are different categories, some of which might be characterised as being for testing and exploration, such as the optimizer_features_enable. Arguably the hints that affect join order, access path, and join operations are of this type as they're sometimes discouraged for use in applications. However the optimizer is not perfect, and does not have perfect information, and sometimes it will make a choice based on incomplete information that needs to be corrected.
Some hints are unquestionably useful and appropriate -- APPEND is possibly the best example, as it is the standard method for invoking direct path insert.
In the end it's really difficult to give generalised advice on this. Really every hint needs to be addressed in respect of whether it should be used in production code or not, but if you understand the optimiser and understand what the hints you are considering really do and whether there are better alternatives -- eg better statistics, different init parameters, or dynamic sampling (itself a hint) -- the you'll be able to make your own assessment and defend it if challenged.

Related

When to use hints in oracle query [duplicate]

This question already has an answer here:
When to use Oracle hints?
(1 answer)
Closed 5 years ago.
I have gone through some documentation on the net and using hints is mostly discouraged. I still have doubts about this. Can hints be really useful in production specially when same query is used by hundreds of different customer.
Is hint only useful when we know the number of records that are present in the tables? I am using leading in my query and it gives faster results when the data is very large but the performance is not that great when the records fetched are less.
This answer by David is very good but I would appreciate if someone clarified this in more details.
Most hints are a way of communicating our intent to the optimizer. For instance, the leading hint you mention means join tables in this order. Why is this necessary? Often it's because the optimal join order is not obvious, because the query is badly written or the database statistics are inaccurate.
So one use of hints such as leading is to figure out the best execution path, then to figure out why the database doesn't choose that plan without the hint. Does gathering fresh statistics solve the problem? Does rewriting the FROM clause solve the problem? If so, we can remove the hints and deploy the naked SQL.
Some times there are times where we cannot resolve this conundrum, and have to keep the hints in Production. However this should be a rare exception. Oracle have had lots of very clever people working on the Cost-Based Optimizer for many years, so its decisions are usually better than ours.
But there are other hints we would not blink to see in Production. append is often crucial for tuning bulk inserts. driving_site can be vital in tuning distributed queries.
Conversely other hints are almost always abused. Yes parallel, I'm talking about you. Blindly putting /*+ parallel (t23, 16) */ will probably not make your query run sixteen times faster, and not infrequently will result in slower retrieval than a single-threaded execution.
So, in short, there is no universally applicable advice to when we should use hints. The key things are:
understand how the database works, and especially how the cost-based optimizer works;
understand what each hint does;
test hinted queries in a proper tuning environment with Production-equivalent data.
Obviously the best place to start is the Oracle documentation. However, if you feel like spending some money, Jonathan Lewis's book on the Cost-Based Optimizer is the best investment you could make.
I couldn't just rephrase that, so I will paste it here
(it's a brief explanation as of "When Not To Use Hints", that I had bookmarked):
In summary, don’t use hints when
What the hint does is poorly understood, which is of course not limited to the (ab)use of hints;
You have not looked at the root cause of bad SQL code and thus not yet tapped into the vast expertise and experience of your DBA in tuning the database;
Your statistics are out of date, and you can refresh the statistics more frequently or even fix the statistics to a representative state;
You do not intend to check the correctness of the hints in your statements on a regular basis, which means that, when statistics change, the hint may be woefully inadequate;
You have no intention of documenting the use of hints anyway.
Source link here.
I can summarize this as: The use of hints is not only a last resort, but also a lack of knowledge on the root cause of the issue. The CBO (Cost Based Optimizer) does an excellent job, if you just ensure some basics for it. Those include:
Fresh statistics
1.1. Index statistics
1.2. Table statistics
1.3. Histograms
Correct JOIN conditions and INDEX utilization
Correct Database settings
This article here is worth reading:
Top 10 Reasons for poor Oracle performance
Presented by non other, but Mr. Donald Burleson.
Cheers
In general hints should be used only exceptional, I know following situations where they make sense:
Workaround for Oracle bugs
Example: Once for a SELECT statement I got an error ORA-01795: maximum expression number in list - 1000, although the query did not contain an IN expression at all.
Problem was: The queried table contains more than 1000 (sub-) partitions and Oracle did a transformation of my query. Using the (undocumented) hint NO_EXPAND_TABLE solved the issue.
Datewarehouse application while staging
While staging you can have significant changes on your data where the table/index statistics are not aware about as statistics are gathered only once a week by default. If you know your data structure then hints could be useful as they are faster than running DBMS_STATS.GATHER_TABLE_STATS(...) manually all the time in between your operations. On the other hand you can run DBMS_STATS.GATHER_TABLE_STATS() even for single columns which might be the better approach.
Online Application Upgrade Hints
From Oracle documentation:
The CHANGE_DUPKEY_ERROR_INDEX, IGNORE_ROW_ON_DUPKEY_INDEX, and
RETRY_ON_ROW_CHANGE hints are unlike other hints in that they have a
semantic effect. The general philosophy explained in "Hints" does not
apply for these three hints.

Subselect vs Slicer in MDX

If I want my results to be evaluated in the context of any tuple in MDX, but don't want this tuple to be part of the results, I use either of the below two options.
1. SUBSELECT
SELECT [Measures].[SomeMeasure] ON 0,
[DimName].[HierName].children ON 1
FROM
(SELECT foo.bar.&[Val] ON 0 FROM
[MyCube])
2. SLICER
SELECT [Measures].[SomeMeasure] ON 0,
[DimName].[HierName].children ON 1
FROM
[MyCube]
WHERE (foo.bar.&[Val])
Third option that came to my mind is EXISTSclause, but soon I realized that it is meant for something else altogether.
So, other aspects aside, I am interested in the general performance of these queries, any benchmarks or best practices to be kept in mind and which one to go for in which circumstances.
As mostly with optimizer questions, the answer is: It depends. I would say WHERE is faster in many situations, but there are cases where subselect is faster.
Optimizers are a normally not documented to each detail by vendors (even if some ore more documented than others, and Analysis Services is a typical example of an engine with a less documented optimizer). I would think they have many, many rules in their code like "if this and that, but not a third condition, then go along that route". And this constantly changes, hence any documentation would be outdated with more or less each hotfix.
As said, the situation is a bit better for many relational engines, where for SQL Server, you can at least show a plan that is more or less understandable. But even there you do not know why exactly the optimizer chose this plan and not another, and sometimes have to try several approaches to get the optimizer on another path (like using an index, ...). And a new release of SQL Server may handle things differently, hopefully better in most cases, but possibly worse in a few rare cases.
That clearly is also not a clear and documented way of writing code, but just trial and error.
In summary: You will have to test with your cube and your typical queries.
Anyway, in many cases, the performance difference is so small that it is not relevant.
Finally, the best documentation that is available for the Analysis Services optimizer is the old blog of one of the Analysis Services query engine developers at http://sqlblog.com/blogs/mosha/default.aspx. This being a blog, it is not very systematic, but just a collection of some random samples of optimizer behavior with the reasons behind it.
As far as I know, If you want to cache results of your queries and improve overall throughput then slicer is better, but if you just care about single query performance then you can get better performance with subselect.
Answering the question below,
Following information is from Chris Webb
https://social.msdn.microsoft.com/Forums/sqlserver/en-US/c1fe120b-256c-425e-93a5-24278b2ab1f3/subselect-or-where-slice?forum=sqlanalysisservices
First of all, it needs to be said that subselects and the Where clause do two different things - they're not interchangeable in all circumstances, they may return different results, and sometimes one performs better, sometimes the other does, because they may result in different query plans being generated. One technique is not consistently 'better' than the other on all cubes, and any differences in performance may well change from service pack to service pack.
To answer the original question: use whichever you find gives you the best query performance and returns the results you want to see. In general I prefer the Where clause (which is not deprecated); the reason is that while a subselect may perform faster initially in some cases, it limits Analysis Services' ability to cache the result of calculations which means slower performance in the long term:
http://cwebbbi.spaces.live.com/blog/cns!7B84B0F2C239489A!3057.entry

Oracle SQL user defined hints

Does oracle allow creation of mnemonics for hints in SQL? I am not able to find any such concept in Oracle docs but I have found certain queries running in production with hints like the below one:
select /*+ AHintWhichIsNotInOracleDocumentation */ from some_table;
I thought optimizer would safely ignore this until I found http://www.confio.com/logicalread/oracle-11g-making-query-run-magically-faster-mc02/#.U_D8WqOTJUA
The author talks about "adding" a hint called "RICHS_SECRET_HINT" in the "X$" tables? Is this feature available in Oracle? If yes, links to docs please. Thanks.
Edit:
To clarify further, I am looking to find if there Oracle provides ways to create a key-value sort of relationship between new hints and Oracle provided hints. This seems to be a pointless feature considering it is good only to shorten the length of the hints when there are a lot of hints used in a SQL which is rarely the case. But considering the hints I saw in work, I am more curious to find if they exists or not.
So in essence in the above SQL, I am expecting some mapping between AHintWhichIsNotInOracleDocumentation and Oracle's standard hints like ORDERED, USE_NL, etc.,
The article you've linked to has nothing to do with creating your own hint. The article is demonstrating that it is relatively easy to trick yourself into thinking that you have improved the performance of a query (in this case by adding a hint that does nothing) when the reality is that the performance improved only because data has been cached by the prior executions.
You cannot define your own hints. There are hints that are undocumented. Given how rarely using a documented hint is the proper long-term answer to improving query performance (it almost always makes sense to fix the underlying statistics issue or to create a profile/ outline/ etc.), it would be exceedingly unlikely that you'd want to use an undocumented hint. I can't imagine a case where it would make sense to be able to define your own hint.

In non-procedural languages, what specifies how things are to be done?

If you compare C vs SQL, this is the argument:
In contrast to procedural languages
such as C, which describe how things
should be done, SQL is nonprocedural
and describes what should be done.
So, the how part for languages like SQL is specified by the language itself, is it? What if I want to change the way some query works. Suppose I want to change the way a SELECT is handled. Is that possible?
So, the how part for languages like
SQL is specified by the language
itself, is it?
Not strictly by the language (ie. SQL), but normally by the database and its optimiser. As such, even where the same data is being queried from tables with the same structures and the same indexes, some databases will build the resultset in a different way to others.
Suppose I want to change the way a
SELECT is handled. Is that possible?
To some degree, yes. You can either:
Rewrite the query, to achieve the same result a different way, or
Use hinting - http://en.wikipedia.org/wiki/Hint_%28SQL%29
Neither of these directly instruct the database engine which approach to use, but both of them will affect how the resultset is returned - this is likely to vary between databases.
Additionally, I understand that some databases have additional interfaces that allow more low-level interaction with the database engine, enabling greater control over how a query is built than is possible from plain SQL. (However, your question did specify SQL.)
This is actually exaggerating the difference. There is no clear-cut point at which one is telling how things are done and the other only telling what it done. Rather, one may have to specify what/how things are done at a greater level of detail than the other. A typical SQL implementation allows the user to control such things as what indexes are used (or ignored), what kind of locking to do, and so on.
If you were to do the same job in C, you would (at some point) have to specify a great deal more detail (unless you used something like ODBC). Nonetheless, you're still telling what should be done, not all the details of how it should be done (e.g., despite being about as low-level as possible short of assembly language, C will still do some type conversions automatically, so you don't have to tell it how to do something like adding an integer to a floating point number -- you just tell it to add them, and it handles the details).
Bottom line: trying to talk about one as procedural and the other as non-procedural is misleading. SQL doesn't always require as much detail, but it's a difference of degree, not really "how" versus "what".

Fastest way to become a MySQL expert?

I have been using MySQL for years, mainly on smaller projects until the last year or so. I'm not sure if it's the nature of the language or my lack of real tutorials that gives me the feeling of being unsure if what I'm writing is the proper way for optimization purposes and scaling purposes.
While self-taught in PHP I'm very sure of myself and the code I write, easily can compare it to others and so on.
With MySQL, I'm not sure whether (and in what cases) an INNER JOIN or LEFT JOIN should be used, nor am I aware of the large amount of functionality that it has. While I've written code for databases that handled tens of millions of records, I don't know if it's optimum. I often find that a small tweak will make a query take less than 1/10 of the original time... but how do I know that my current query isn't also slow?
I would like to become completely confident in this field in the ability to optimize databases and be scalable. Use is not a problem -- I use it on a daily basis in a number of different ways.
So, the question is, what's the path? Reading a book? Website/tutorials? Recommendations?
EXPLAIN is your friend for one. If you learn to use this tool, you should be able to optimize your queries very effectively.
Scan the the MySQL manual and read Paul DuBois' MySQL book.
Use EXPLAIN SELECT, SHOW VARIABLES, SHOW STATUS and SHOW PROCESSLIST.
Learn how the query optimizer works.
Optimize your table formats.
Maintain your tables (myisamchk, CHECK TABLE, OPTIMIZE TABLE).
Use MySQL extensions to get things done faster.
Write a MySQL UDF function if you notice that you would need some
function in many places.
Don't use GRANT on table level or column level if you don't really need
it.
http://dev.mysql.com/tech-resources/presentations/presentation-oscon2000-20000719/index.html
The only way to become an expert in something is experience and that usually takes time. And a good mentor(s) that are better than you to teach you what you are missing. The problem is you don't know what you don't know.
Research and experience - if you don't have the projects to warrant the research, make them. Make three tables with related data and make up scenarios.
E.g.
Make a table of movies their data
make a table of user
make a table of ratings for users
spend time learning how joins work, how to get movies of a particular rating range in one query, how to search the movies table ( like, regex) - as mentioned, use explain to see how different things affect speed. Make a day of it; I guarantee your
handle on it will be greatly increased.
If you're still struggling for case-scenarios, start looking here on SO for questions and try out those scenarios yourself.
I don't know if MIT open courseware has anything about databases... Well whaddya know? They do: http://ocw.mit.edu/OcwWeb/Electrical-Engineering-and-Computer-Science/6-830Fall-2005/CourseHome/
I would recommend that as one source based only on MITs reputation. If you can take a formal course from a university you may find that helpful. Also a good understanding of the fundamental discrete mathematics/logic certainly would do no harm.
As others have said, time and practice is the only real approach.
More practically, I found that EXPLAIN worked wonders for me personally. Learning to read the output of that was probably the biggest single leap I made in being able to write efficient queries.
The second thing I found really helpful was SQL Tuning by Dan Tow, which describes a fairly formal methodology for extracting performance. It's a bit involved, but works well in lots of situations. And if nothing else, it will give you a much better understanding of the way joins are processed.
Start with a class like this one: https://www.udemy.com/sql-mysql-databases/
Then use what you've learned to create and manage a number of SQL databases and run queries. Getting to the expert level is really about practice. But of course you need to learn the pieces before you can practice.