Performance of SQL standards vs T-SQL extensions

Performance of SQL standards vs T-SQL extensions - sql

Articles on the internet say user-defined functions can either burden or increase the performance.
Now, I know that standard SQL is pretty limited, however, some of the behavior can still be written as in T-SQL built-in functions.
For example, adddays() vs. dateadd() . Another point I've heard that its also better to use coalesce() - the ANSI standard function rather than isNull().
What is the performance difference between using the ANSI SQL standard functions vs T-SQL functions?
Does T-SQL adds any burden what so ever on the performance with it trying to make the job easier, or not?
My research does not seem to indicate a trend.

You will need to approach this on a case-by-case basis and do actual testing. There is no general rule, other than Microsoft tries to make the entire stack perform as well as possible. TESTING is what you need to do - we can't tell you that always a certain thing would be faster. That would be really bad advice.
It is important to do this testing on your actual production data, prefereably a copy of it. Do not rely on tests done against data sets that aren't yours. When you're talking about performance differences of functions, some very subtle things can make a big difference. Things like the size of the table, the data types involved, the indexing, and SQL Server versions, can change the result of these tests. That is why "no one has done this" for you. We can't.

Related

Oracle vs SQL Server query optimization main difference

I am doing a paper about query optimization in different DBMS, and I am trying to find core differences in those.
Both use CBO, cost based optimization in the same way, parse the query -> generate plans -> pick best one given statistics about the database.
I'm still researching information on those two engines, but if someone knows how they differ (or not) will be appreciated.

Not a comprehensive answer at all, but wanted to give you my insight. In short, Oracle has a much more developed SQL optimizer.
For starters, Oracle has much more algorithms to choose from. This means, sometimes Oracle distinguish between subtle differences and offer, let's say, three algorithms; MySQL (under the same circumstances) only has one to choose from. Therefore, Oracle has better options for particular cases.
Another difference is that MySQL's execution plans are not very readable. I'm not saying they are bad internally, just that the explain extended doesn't tell you many specifics. Oracle makes a very clear difference between access and filter predicates, while in MySQL you don't really know what's going on.
Oracle has many algorithms suitable for parallel processing in multi servers, while MySQL is limited to multiple thread in the same machine. This can make a difference for highly parallelisable queries than benefit for multi-servers.
Oracle still has a RBO (Rule-Based Optimizer) than can be useful on some occasions. MySQL doesn't. Anyway, Oracle recommends not to use it, but it's still there if you need it.
Oracle offers a myriad of "hints" to the optimizer in the form of comments (/* ... */ as far as I remember?) where you can tweak the execution plan to suit your needs. MySQL has fewer "clauses" for this.

Tips for optimizing sql commands worrying about legacy

The concern with the legacy of the SQL statements is a constant in my head. Especially when SCRUM is used, where the code has no owner, that is, all must be able to repair and maintain each piece. Optimizing SQL procedures usually means converting it into a set-based commands and using special operators. I need tips to keep the code working without forgetting the threshold optimization x readability.

Comments. If it's a newer command or an obscure one, make sure to leave a comment associated with the statement describing it in code/source. That way you don't have another developer down the road refactoring the statement to improve readability at the cost of performance. My general guideline with this is if someone of intermediate skill level or below would have to spend several minutes or more searching for what the statement is really doing, leave the comment to save them time.

I wouldn't worry so much about readability other than having the formatting conform to defined standards. Optimization is much more important than using only simple SQL that anyone can understand. That is where comments should come in... Explain what the SQL should be doing and why you chose a certain optimization technique. The added advantage to this is that it will help the next person who reads it to learn new SQL techniques.

I've found the best solution to be to include, in your comments, a clearly qualified, duplicable optimization test for the query, using statistics from the optimizer. (This also works nicely for stored procedures, where the same issues appear.)
Include a statement about the nature of the testing context (hardware and data), data generation code if necessary, and a clear description of testing conditions (cache settings, repetitions, etc.) Better yet, agree on a team template for this spec.
Then enforcing comparisons needs to be built into your culture somewhere ... the best solution would be for the culture to expect documented before-and-after optimization testing.

Query of queries outside ColdFusion

My experience of using Adobe ColdFusion, even if still somewhat limited, was absolutely joyful and pleasant.
Of all good things I could say about ColdFusion, one feature completely blew me off my feet. It might be neither very effective, or useful in production, but anyway, I am talking about the so-called "query of queries" feature, or the dbtype="query" attribute of cfquery. It allows you to run SQL statements against arbitrary datasets, not just a database connection. You can, for example, join a resultset, that you've just retrieved from the database and an in-memory structure (that is, of course, subject to certain limitations). It provides a quick-and-dirty way to kind of "post-process" the data, which can sometimes be much more readable (and flexible, too!), than, say, iterating through the dataset in a loop.
However, ColdFusion is not a very popular product and I am not going to go over the reasons why it is like that. What I am asking is, is there any support for this technique in other languages (like a library, that does more or less the same)? Python? Perl? Ruby? PHP? Anything? Because, to me it seems, that the potential of this feature is huge, maybe not in production code, but it is an absolute life-saver if you need to test something quickly. Needless to say, the SQL ColdFusion uses for this is somewhat limited, to my knowledge, but still, the idea is still great.

If you don't find anything that handles data as well as ColdFusion then remember it plays very well with other programming languages. You can always do the heavy query processing in CF then just wrap your processing logic in remote CFCs and expose them as web services serving up JSON.
That will let you benefit from what you find great about ColdFusion while trying out some other languages.
If you need to get away from CF try SqlAlchemy in Python, or like other posters said Rails and LINQ are worth playing with.

i can't for python, ruby, perl, php. however .Net has something called LINQ which is essentially QoQ on steroids.

Lots of frameworks use object-relational mapping (ORM), which will convert your database tables to objects.
For example, using Rails you fetch data from a model instead of directly talking to the database. Queries, or finds, are returned as array objects, which can in turn be queried themselves.

You can also accomplish this in .NET by using LINQ. LINQ will let you query objects as well as databases.

In doing performance analysis of Query of Queries, I was surprised by their execution time, I could not get them to return in less than 10ms in my tests, where simply queries to the actual database would return in 1ms or less. My understanding (at least in CF MX 7) is that while this is a useful function, it is not a highly optimized one. I found it to be much faster to loop over the query manually doing conditional logic to replace what I was attempting to do with my query of queries.
That being said, it is faster than going to the database if the initial query is slow. Just don't use it thinking that it is going to always be faster than doing a more creative sort or initial query as each QofQ is far from instantaneous.

For Java, there's three projects worth taking a look at, each with it's own positives and negatives, some more SQL like than others. JoSQL JoSQL, JXPath, and MetaModel.
Maybe one of these day's I'll figure out how to call a QoQ directly from the Java underneath CF. ;)

This technique (ColdFusion's query-of-queries) is one of the worst ideas out there. Not only does it keep business logic in the database, but it takes what little business logic you have left in your code and shoves it out to the database just for spite.
What you need is a good language, not bad techniques to make up for deficiencies.
Python and Ruby, as well as other languages not on your list such as C# and Haskell, have exceptional support for writing arbitrary and powerful queries against in-memory objects. This is in fact the technique that you want, not ColdFusion's query-of-queries. The technique of writing queries against in-memory objects is an aspect of a general style of programming called functional programming.

UDF Performance in MySQL

I'm noticing exponential performance degradation in MySQL query execution times when queries include calls to UDF's in the SELECT or WHERE clauses. The UDF's in question query local tables to return scalar values - so they're not just performing arithmetic expressions, but serving as correlated subqueries. I've fixed the performance problem by simply removing the UDF's and rewriting with correlated subqueries, more complex joins, etc.
I suppose if I only had experience with MySQL I would simply accept this as a reality, adjust my use of UDF's and move on. But prior to working with MySQL I worked for 5+ years on SQL Server. I built a billing system that processed much larger data sets and relied very heavily on both scalar and table-value user-defined functions. Those UDF's also performed queries (i.e. not just arithmetic operations). I didn't experience this sort of performance penalty when using user-defined functions on SQL Server.
What I'm wondering is whether there's anyone here who knows SQL Server vs. MySQL internals well enough to confirm or explain away my current theory as to the cause for this performance difference in UDF's on the two systems. My theory is that SQL Server's optimizer evaluates UDF's differently than MySQL's. Perhaps it's because the table engines are decoupled in MySQL? Or maybe the use of UDF's on SQL Server is more prevalent and the MySQL engine's optimizer simply hasn't evolved as far yet? What I'm thinking is that maybe the SQL Server optimizer treats included UDF's as part of the surrounding query (when possible) and then optimizes it along with the rest of the query? Maybe I'm way off the mark here, but I just never saw this kind of performance hit for using UDF's on SQL Server.
Any light others can shed on this issue will be appreciated.

UDFs have known limitations and problems. Please see: Are UDFs Harmful to SQL Server Performance?
There are many articles on this topic. Hopefully this is a non-subscriber access: Beware Row-by-Row Operations in UDF Clothing

I know this is an old question, but it comes up first in the Google search for "MySQL UDF performance" and does not yet have an adequate answer - one link in the accepted answer is broken, the other does not appear to talk about the specifics of MySQL UDFs.
First, let us make sure we are talking about the actual MySQL UDFs. In MySQL, there is a distinction between a "stored function" and a UDF. A stored function is run using the internal stored function/procedure interpreter. A UDF is written in C++ and is compiled into a shared library which is loaded by MySQL server into the memory and when called, it runs as machine code on the CPU. Thus, UDF performance is frequently orders of magnitude better than that of stored functions.
So first of all, make sure you are talking about the actual UDF, and this is not a stored function.
Second, MySQL UDF performance depends on the nature of the algorithm it is executing and the quality of its implementation. For example, if your UDF is testing all possible triplets of characters of a string that is 1000 bytes long, it will be examining 1 billion combinations, and will take around several seconds per row. So if removing the UDFs makes your code run significantly faster, the next step is debugging the UDF itself to make sure it is written optimally - or perhaps the question the UDF is trying to answer just cannot be answered quickly.
That said, a well-written UDF that is answering a relatively simple question is usually lightening-fast compared to the I/O needed to feed it the data to analyze.

How Important is SQL Portability?

It seems to me, from both personal experience and SO questions and answers, that SQL implementations vary substantially. One of the first issues for SQL questions is: What dbms are you using?
In most cases with SQL there are several ways to structure a given query, even using the same dialect. But I find it interesting that the relative portability of various approaches is frequently not discussed, nor valued very highly when it is.
But even disregarding the likelihood that any given application may or not be subject to conversion, I'd think that we would prefer that our skills, habits, and patterns be as portable as possible.
In your work with SQL, how strongly do you prefer standard SQL syntax? How actively do you eschew propriety variations? Please answer without reference to proprietary preferences for the purpose of perceived better performance, which most would concede is usually a sufficiently legitimate defense.

I vote against standard/vendor independent sql
Only seldom the database is actually switched.
There is no single database that fully conforms to the current sql standard. So even when you are standard conform, you are not vendor independent.
vendor differences go beyond sql syntax. Locking behaviour is different. Isolation levels are different.
database testing is pretty tough and under developed. No need to make it even harder by throwing multiple vendors in the game, if you don't absolutly need it.
there is a lot of power in the vendor specific tweaks. (think 'limit', or 'analytic functions', or 'hints' )
So the quintessence:
- If there is no requirement for vendor independence, get specialised for the vendor you are actually using.
- If there is a requirement for vendor independence make sure that who ever pays the bill, that this will cost money. Make sure you have every single rdbms available for testing. And use it too
- Put every piece of sql in a special layer, which is pluggable, so you can use the power of the database AND work with different vendors
- Only where the difference is a pure question of syntax go with the standard, e.g. using the oracle notation for (outer) joins vs the ANSI standard syntax.

We take it very seriously at our shop. We do not allow non-standard SQL or extensions unless they're supported on ALL of the major platforms. Even then, they're flagged within the code as non-standard and justifications are necessary.
It is not up to the application developer to make their queries run fast, we have a clear separation of duties. The query is to be optimized only by the DBMS itself or the DBAs tuning of the DBMS.
Real databases, like DB2/z :-), process standard SQL plenty fast.
The reason we enforce this is to give the customer choice. They don't like the idea of being locked into a specific vendor any more than we do.

In my experience, query portability turns out to be not so important. We work with various data sources (mainly MSSQL and MySQL), but we know which data is stored where and can optimize accordingly. Since we control the systems, we decide when - if ever - structures are moved and queries need to be rewritten.
I also like to use certain other server-specific functionality, such as query notification in SQL Server, which MySQL doesn't offer. So there, again, we use it when we can and don't worry about portability.
Furthermore, parts of our apps need to query schema information and act on it. Here, again, we have server-specific code for the different systems, instead of trying to restrict ourselves to the lowest common denominator.

There is no clear answer whether SQL portability is desirable or not - it really depends a lot on the situation, such as the type of application.
If the application is going to be a service - ie there will only ever be you hosting it, then obviously nobody but you will care whether your SQL is portable enough, so you could safely ignore it as long as you have no specific plans to drop support for your current platform.
If the application is going to be installed at a number of sites, which each have their own established database systems, obviously SQL portability is very important to people. It lets you widen your potential market, and may give a bit of piece of mind to clients who are on the fence in regards to their database system. Whether you want to support that, or you are happy selling only to, for instance, Oracle customers, or only to MySQL/PostgreSQL customers, for example, is up to you and what you think your market is.
If you are coding in PHP, then the vast majority of your potential customers are probably going to expect MySQL. If so, then it's not a big deal to assume MySQL. Or similarly if you are in C#/.NET then you could assume Microsoft SQL Server. Again, however, there is a flip side because there may exist a small but less competitive market of PHP or .NET users who want to connect to other database systems than the usual.
So I would largely regard this as a market research question, unless as in my first example you are providing a hosted service where it doesn't matter to users, in which case it is for your own convenience only.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas