I'm noticing exponential performance degradation in MySQL query execution times when queries include calls to UDF's in the SELECT or WHERE clauses. The UDF's in question query local tables to return scalar values - so they're not just performing arithmetic expressions, but serving as correlated subqueries. I've fixed the performance problem by simply removing the UDF's and rewriting with correlated subqueries, more complex joins, etc.
I suppose if I only had experience with MySQL I would simply accept this as a reality, adjust my use of UDF's and move on. But prior to working with MySQL I worked for 5+ years on SQL Server. I built a billing system that processed much larger data sets and relied very heavily on both scalar and table-value user-defined functions. Those UDF's also performed queries (i.e. not just arithmetic operations). I didn't experience this sort of performance penalty when using user-defined functions on SQL Server.
What I'm wondering is whether there's anyone here who knows SQL Server vs. MySQL internals well enough to confirm or explain away my current theory as to the cause for this performance difference in UDF's on the two systems. My theory is that SQL Server's optimizer evaluates UDF's differently than MySQL's. Perhaps it's because the table engines are decoupled in MySQL? Or maybe the use of UDF's on SQL Server is more prevalent and the MySQL engine's optimizer simply hasn't evolved as far yet? What I'm thinking is that maybe the SQL Server optimizer treats included UDF's as part of the surrounding query (when possible) and then optimizes it along with the rest of the query? Maybe I'm way off the mark here, but I just never saw this kind of performance hit for using UDF's on SQL Server.
Any light others can shed on this issue will be appreciated.
UDFs have known limitations and problems. Please see: Are UDFs Harmful to SQL Server Performance?
There are many articles on this topic. Hopefully this is a non-subscriber access: Beware Row-by-Row Operations in UDF Clothing
I know this is an old question, but it comes up first in the Google search for "MySQL UDF performance" and does not yet have an adequate answer - one link in the accepted answer is broken, the other does not appear to talk about the specifics of MySQL UDFs.
First, let us make sure we are talking about the actual MySQL UDFs. In MySQL, there is a distinction between a "stored function" and a UDF. A stored function is run using the internal stored function/procedure interpreter. A UDF is written in C++ and is compiled into a shared library which is loaded by MySQL server into the memory and when called, it runs as machine code on the CPU. Thus, UDF performance is frequently orders of magnitude better than that of stored functions.
So first of all, make sure you are talking about the actual UDF, and this is not a stored function.
Second, MySQL UDF performance depends on the nature of the algorithm it is executing and the quality of its implementation. For example, if your UDF is testing all possible triplets of characters of a string that is 1000 bytes long, it will be examining 1 billion combinations, and will take around several seconds per row. So if removing the UDFs makes your code run significantly faster, the next step is debugging the UDF itself to make sure it is written optimally - or perhaps the question the UDF is trying to answer just cannot be answered quickly.
That said, a well-written UDF that is answering a relatively simple question is usually lightening-fast compared to the I/O needed to feed it the data to analyze.
Related
I am doing a paper about query optimization in different DBMS, and I am trying to find core differences in those.
Both use CBO, cost based optimization in the same way, parse the query -> generate plans -> pick best one given statistics about the database.
I'm still researching information on those two engines, but if someone knows how they differ (or not) will be appreciated.
Not a comprehensive answer at all, but wanted to give you my insight. In short, Oracle has a much more developed SQL optimizer.
For starters, Oracle has much more algorithms to choose from. This means, sometimes Oracle distinguish between subtle differences and offer, let's say, three algorithms; MySQL (under the same circumstances) only has one to choose from. Therefore, Oracle has better options for particular cases.
Another difference is that MySQL's execution plans are not very readable. I'm not saying they are bad internally, just that the explain extended doesn't tell you many specifics. Oracle makes a very clear difference between access and filter predicates, while in MySQL you don't really know what's going on.
Oracle has many algorithms suitable for parallel processing in multi servers, while MySQL is limited to multiple thread in the same machine. This can make a difference for highly parallelisable queries than benefit for multi-servers.
Oracle still has a RBO (Rule-Based Optimizer) than can be useful on some occasions. MySQL doesn't. Anyway, Oracle recommends not to use it, but it's still there if you need it.
Oracle offers a myriad of "hints" to the optimizer in the form of comments (/* ... */ as far as I remember?) where you can tweak the execution plan to suit your needs. MySQL has fewer "clauses" for this.
Obvious security benefits aside, are there significant performance boosts yielded from modifying involved queries to stored procedures in a SAP HANA database?
If so, are there metrics I can use to gauge perceived benefits?
SQL Server 2005 onward, all SQL statements, irrespective of whether it’s a SQL coming from inline code or stored procedure or from anywhere else, they are compiled and cached. So, stored procedures won't give you performance boosts. They do give you better abstraction, security and ease of maintenance.
Read more about it here.
As for SAP HANA, I tried comparing it with Microsoft SQL Server(This article sheds some light on it.) and I cannot definitively say that it does compile and cache inline queries but it most probably should if you're using a recent version.
Read the other solution before reading this one.
Essentially, I ran some tests and swapped out my project's dynamic queries to stored procedures and saw massive performance bumps.
I was reading the SAP HANA Best Practices reference and came across this passage under Performance:
Executing dynamic SQL is slow because compile time checks and query
optimization must be done for every invocation of the procedure.
Another related problem is security because constructing SQL
statements without proper checks of the variables used may harm
security.
Articles on the internet say user-defined functions can either burden or increase the performance.
Now, I know that standard SQL is pretty limited, however, some of the behavior can still be written as in T-SQL built-in functions.
For example, adddays() vs. dateadd() . Another point I've heard that its also better to use coalesce() - the ANSI standard function rather than isNull().
What is the performance difference between using the ANSI SQL standard functions vs T-SQL functions?
Does T-SQL adds any burden what so ever on the performance with it trying to make the job easier, or not?
My research does not seem to indicate a trend.
You will need to approach this on a case-by-case basis and do actual testing. There is no general rule, other than Microsoft tries to make the entire stack perform as well as possible. TESTING is what you need to do - we can't tell you that always a certain thing would be faster. That would be really bad advice.
It is important to do this testing on your actual production data, prefereably a copy of it. Do not rely on tests done against data sets that aren't yours. When you're talking about performance differences of functions, some very subtle things can make a big difference. Things like the size of the table, the data types involved, the indexing, and SQL Server versions, can change the result of these tests. That is why "no one has done this" for you. We can't.
I'm writing many reporting queries for my current employer utilizing Oracle's WITH clause to allow myself to create simple steps, each of which is a data-oriented transformation, that build upon each other to perform a complex task.
It was brought to my attention today that overuse of the WITH clause could have negative side effects on the Oracle server's resources.
Can anyone explain why over use of the Oracle WITH clause may cause a server to crash? Or point me to some articles where I can research appropriate use cases? I started using the WITH clause heavily to add structure to my code and make it easier to understand. I hope with some informative responses here I can continue to use it efficiently.
If an example query would be helpful I'll try to post one later today.
Thanks!
Based on this: http://www.dba-oracle.com/t_with_clause.htm it looks like this is a way to avoid using temporary tables. However, as others will note, this may actually mean heavier, more expensive queries that will put an additional drain on the database server.
It may not 'crash'. That's a bit dramatic. More likely it will just be slower, use more memory, etc. How that affects your company will depend on the amount of data, amount of processors, amount of processing (either using with or not)
I was just wondering where I could find more information on these optimizations? Google searches tend to emphasize prepared queries and such, and not really at the angle of the abstraction the SQL provides.
Source:
http://www.joelonsoftware.com/articles/LeakyAbstractions.html
The SQL language is meant to abstract
away the procedural steps that are
needed to query a database, instead
allowing you to define merely what you
want and let the database figure out
the procedural steps to query it. But
in some cases, certain SQL queries are
thousands of times slower than other
logically equivalent queries. A famous
example of this is that some SQL
servers are dramatically faster if you
specify "where a=b and b=c and a=c"
than if you only specify "where a=b
and b=c" even though the result set is
the same. You're not supposed to have
to care about the procedure, only the
specification. But sometimes the
abstraction leaks and causes horrible
performance and you have to break out
the query plan analyzer and study what
it did wrong, and figure out how to
make your query run faster.
Looking at MySql in particular.
You can try SQL Server Performance, although I think it's geared towards MS SQL Server more than other RDBMSs. Personally, I look at performance tuning as a process more than a collection of tidbits.
Once you get the process down you're likely to come across single item optimizations as you go, but it's the process itself that will give you the most bang for your buck. Learn how to read query plans (or the equivalent in your RDBMS), learn the insides/behind the scenes implementation of your server, how it stores and uses indexes, how to find bottlenecks in IO, memory, locking, etc.
Books are better than web searches to learn performance tuning for a database. It is a complex subject and varies greatly from datbase to database and even as #OMGPonies said from version to version.
Only My SQL Performance book I found at amazon, don;t know how good it is:
http://www.amazon.com/High-Performance-MySQL-Optimization-Replication/dp/0596101716/ref=sr_1_1?ie=UTF8&s=books&qid=1277756707&sr=8-1
"these" are not optimizations.
Learn profiling - the source of all optimizations.
That's all you need.
One you mentioned is not "optimization tidbit". It was an example of totally different subject.
And it is not supposed for blind usage.
But only as a result of profiling, if applicable.
The whole your approach is wrong. There are no "optimization tidbits". There are only profiling. Once you find your "where a=b and b=c" query runs slow - you can start looking for the solution, not sooner.
So, you have 2 instruments to use:
BENCHMARK your query goes here
and
EXPLAIN your query goes here
study their output and then ask particular questions, regarding your server, your settings, your database. That's the only way. No "canned recipe" could help.
As for just a curious reading, you can follow blog, named surprisingly http://mysqlperformanceblog.com