i have a table valued function with quite some code inside, doing multiple join selects and calling sub-functions and returns a result set. during the development of this function, at some point, i faced a performance degradation when executing the function. normally it shouldn't take more than 1 sec but it started taking about 10 sec. I played a bit with joins and also indexes but nothing changed dramatically.
after some time of changes and research, I wanted to see the results with another way. I created the same exact code with same exact parameters as a stored procedure. then i executed the sp. boom! it takes less then 1 sec. the same exact code takes about 10 sec with a function.
i really cannot figure out what this all about and i have no time to do more research. I need it as a function for some reasons but i don't know what to do at this point. I thought i could create it as a proc then call it within the function but then i realized it's not possible to do it for functions.
i wanted to hear some good views and advice here from experts.
thanks in advance
ps:i did not add any code here as the code is not in a good format and quite dirty. i would share it if anybody is interested. server is sql 2014 enterprise 64 bit
edit: i saw the possible duplicate question before but it did not satisfy me as my question is specifically about performance hit. the other question has many answers about general differences between procedures and functions. i want to make it more clear about possible performance related differences.
These are the differences from my experience:
When you first started writing the function, you are likely to run it with the same parameters again & again until it works correctly. This enables page caching in which SQL Server keeps the relevant data in memory.
Functions do not cache their execution plans. As you add more data, it takes longer to come up with a plan. SET STATISTICS TIME ON to see query compilation time vs. execution time.
Functions can only use table variables and there's no stats on those. That can make for some horrendous JOIN decisions later.
Some people prefer table-valued functions because they are easier to query:
SELECT * FROM fcn_myfunc(...) WHERE <some_conditions>
Instead of creating a temp table, exec the stored procedure then filter off that temp table. If your code is performance critical, turn it into a stored procedure.
Related
I am working on tuning a stored procedure. It is a huge stored proc and joins tables that has about 6-7 million records.
My question is how do I determine the time spent in the components of the proc. The proc has 1 big select with many temp tables created on the fly (read sub-queries).
I tried using SET STATISTICS TIME ON, SET SHOWPLAN_ALL ON.
I am looking to isolate a chunk of code that takes the most time and not sure of how to do it.
Please help.
PS: I did try to google it, searched on Stackoverflow..........No luck. Here is one question that I looked at
How to improve SQL Server query containing nested sub query
Any help is really appreciated. Thanks in advance.
I would try out SQL Sentry's SQL Plan Explorer. It gives you visual help in finding the problem. It is also a free tool. It highlights the bits that cost a lot of I/O or CPU, versus a generic percent.
Here's where you can check it out:
http://www.sqlsentry.net/plan-explorer/sql-server-query-view.asp
Eric
I realize your asking for "time" (the how long), but maybe you should focus on the "what". What I mean is, tuning to the results of Execution Plan. Ideally using the "Show Execution Plan" is going to give you the biggest bang. And it will tell you, via percentages where it is cost the most resources.
If you are in SSMS 2008 you can right click in your query window and click "Include Execution Plan".
In your scenario, the best way to do this is to just run the components individually. Bear in mind the below is relevant for tuning for execution time primarily (in a low-contingency/concurrency environment). You may have other priorities under a heavy concurrent load.
I have to do a very similar break down on a regular basis for different procedures I have to tune. As a rule the general methodology I follow is:
1 - Do a baseline run
2 - Add PRINT or RAISERROR commands between portions that return the current time to aid in identifying which steps take the longest.
3 - Break down the queries individually. I normally run portions on their own (omit JOIN conditions) to see what the variance is. If it is a very long-running query you can add a TOP clause to any SELECTs to limit the returns. As long as you are consistent this will still give you a good idea.
4 - Tweak the components from step 3 that take the most time. If you have complicated subqueries, maybe make them indexed #temp tables to see if that helps. CTEs as a rule never help performance, so you may need to materialize those as well.
I have virtually the same join query, the difference between my ( >two ) queries being one of the tables on which the join is made. Performance-wise is it better to:
1)rewrite the queries (in one stored procedure ?) OR
2)pass the table on which the join is made as a parameter in a stored procedure (written in plpgsql BTW) and run the query using EXECUTE
I assume 2) is more elegant but word is out that by using EXECUTE one cannot benefit from query optimization
Also, what about when i have a varying number of conditions. How can i make sure the query runs in optimal time? (I take it rewriting the query more than 10 times isn't the way to go :D)
If you want to benefit from the query optimization, you should definitely rewrite the queries.
It does result in less elegant and longer code, that's harder to maintain, but this is a price sometimes necessary to pay for performance.
There is some overhead for using execute, due to repeat planning of the executed query.
For best results and maintainability, write a function that writes the various functions you need. Example:
PostgreSQL trigger to generate codes for multiple tables dynamically
the EXECUTE is dynamic and requires a fresh parse at a minimum - so more overhead.
1)rewrite the queries (in one stored
procedure ?)
If you have the ability to cache the query plan, do so. Dynamically executing SQL means that the backend needs to re-plan the query every time. Check out PREPARE for more details on this.
2)pass the table on which the join is
made as a parameter in a stored
procedure (written in plpgsql BTW) and
run the query using EXECUTE
Not necessary! Pl/PgSQL automatically does a PREPARE/EXECUTE for you. This is one of the primary speed gains that can be had from using Pl/PGSQL. Rhetorical: do you think generating the plan shown in EXPLAIN was cheap or easy? Cache that chunk of work if at all possible.
Also, what about when i have a varying
number of conditions. How can i make
sure the query runs in optimal time?
(I take it rewriting the query more
than 10 times isn't the way to go :D)
Using individual PREPAREed statements is one way, and the most "bare-metal" way of optimizing the execution of queries. You could do exotic things like using a single set returning PL function that you pass different arguments in to and it conditionally executes different SQL, but I wouldn't recommend it. For optimal performance, stick to PREPARE/EXECUTE and manage which named statement inside of your application.
My primary concern is with SQL Server 2005... I went through many website and each tells something different.
What are the scenarios that are good / ok to use.. For example does it hurts to even set variable values inside IF or only if I run a query. Supposing my SPs is building a dynamic SQL based of several conditions in Input Parameters, do I need to rethink about the query... What about a SP that runs different query based on whether some record exists in the table. etc.. etc.. My question is not just limited to these scenarios... I'm looking for a little more generalised answer so that I can improve my future SPs
In essense... Which statements are good to use in Branching conditions / Loops, which is bad and which is Okay.
Generally... Avoid procedural code in your database, and stick to queries. That gives the Query Optimizer the chance to do its job much better.
The exceptions would be code that is designed to do many things, rather than making a result-set, and when a query would need to join rows exponentially to get a result.
It is very hard to answer this question if you don't provide any code. No language construct is Good/Bad/Okay by itself, its what you want to achieve and how well that can be expressed with those constructs.
There's no definitive answer as it really depends on the situation.
In general, I think it's best to keep the logic within a sproc as simple and set-based as possible. Making it too complicated with multiple nested IF conditions for example, may complicate it for the query optimiser meaning it can't create a good execution plan suitable for all paths through the sproc. For example, the first time the sproc is run, it takes path A through the logic and the execution plan reflects this. The next time it runs with different parameters, it takes path B through but resuses the original execution plan which is not optimal for this second path. One solution to this is to break the load into separate stored procedures to call depending on the path being followed - this allows that sub sproc to be optimised and execution plan cached independently.
Loops can be the only viable option, but in general I'd try to not use them - always try to do things in a set-based fashion if it is possible.
What is faster in SQL Server 2005/2008, a Stored Procedure or a View?
EDIT:
As many of you pointed out, I am being too vague. Let me attempt to be a little more specific.
I wanted to know the performance difference for a particular query in a View, versus the exact same query inside a stored procedure.
(I still appreciate all of the answers that point out their different capabilities)
Stored Procedures (SPs) and SQL Views are different "beasts" as stated several times in this post.
If we exclude some [typically minor, except for fringe cases] performance considerations associated with the caching of the query plan, the time associated with binding to a Stored Procedure and such, the two approaches are on the whole equivalent, performance-wise. However...
A view is limited to whatever can be expressed in a single SELECT statement (well, possibly with CTEs and a few other tricks), but in general, a view is tied to declarative forms of queries. A stored procedure on the other can use various procedural type constructs (as well as declarative ones), and as a result, using SPs, one can hand-craft a way of solving a given query which may be more efficient than what SQL-Server's query optimizer may have done (on the basis of a single declarative query). In these cases, an SPs may be much faster (but beware... the optimizer is quite smart, and it doesn't take much to make an SP much slower than the equivalent view.)
Aside from these performance considerations, the SPs are more versatile and allow a broader range of inquiries and actions than the views.
Unfortunately, they're not the same type of beast.
A stored procedure is a set of T-SQL statements, and CAN return data. It can perform all kinds of logic, and doesn't necessarily return data in a resultset.
A view is a representation of data. It's mostly used as an abstraction of one or more tables with underlying joins. It's always a resultset of zero, one or many rows.
I suspect your question is more along the lines of:
Which is faster: SELECTing from a view, or the equivalent SELECT statement in a stored procedure, given the same base tables performing the joins with the same where clauses?
This isn't really an answerable question in that an answer will hold true in all cases. However, as a general answer for an SQL Server specific implementaion...
In general, a Stored Procedure stands a good chance of being faster than a direct SQL statement because the server does all sorts of optimizations when a stored procedure is saves and executed the first time.
A view is essentially a saved SQL statement.
Therefore, I would say that in general, a stored procedure will be likely to be faster than a view IF the SQL statement for each is the same, and IF the SQL statement can benefit from optimizations. Otherwise, in general, they would be similar in performance.
Reference these links documentation supporting my answer.
http://www.sql-server-performance.com/tips/stored_procedures_p1.aspx
http://msdn.microsoft.com/en-us/library/ms998577.aspx
Also, if you're looking for all the ways to optimize performance on SQL Server, the second link above is a good place to start.
In short, based on my experience in some complex queries, Stored procedure gives better performance than function.
But you cannot use results of stored procedure in select or join queries.
If you don't want to use the result set in another query, better to use SP.
And rest of the details and differences are mentioned by people in this forum and elsewhere.
I prefer stored procedures due to Allow greater control over data, if you want to build a good, secure modular system then use stored procedures, it can run multiple sql-commands, has control-of-flow statements and accepts parameters. Everything you can do in a view you can do in a stored procedure. But in a stored procedure, you can do with much more flexibility.
I believe that another way of thinking would be to use stored procedures to select the views. This will make your architecture a loosely coupled system. If you decide to change the schema in the future, you won't have to worry 'so' much that it will break the front end.
I guess what I'm saying is instead of sp vs views, think sp and views :)
Stored procedures and views are different and have different purposes. I look at views as canned queries. I look at stored procedures as code modules.
For example let's say you have a table called tblEmployees with these two columns (among others): DateOfBirth and MaleFemale.
A view called viewEmployeesMale which filters out only male employees can be very useful. A view called viewEmployeesFemale is also very useful. Both of these views are self describing and very intuitive.
Now, lets say you need to produce a list all male employees between the ages of 25 and 30. I would tend to create a stored procedure to produce this result. While it most certainly could be built as a view, in my opinion a stored procedure is better suited for dealing with this. Date manipulation especially where nulls are a factor can become very tricky.
I know I'm not supposed to turn this into a "discussion", but I'm very interested in this and just thought I'd share my empirical observations of a specific situation, with particular reference to all the comments above which state that an equivalent SELECT statement executed from within a Stored Procedure and a View should have broadly the same performance.
I have a view in database "A" which joins 5 tables in a separate database (db "B"). If I attach to db "A" in SSMS and SELECT * from the view, it takes >3 minutes to return 250000 rows. If I take the select statement from the design page of the view and execute it directly in SSMS, it takes < 25 seconds. Putting the same select statement into a stored procedure gives the same performance when I execute that procedure.
Without making any observations on the absolute performance (db "B" is an AX database which we are not allowed to touch!), I am still absolutely convinced that in this case using an SP is an order of magnitude faster than using a View to retrieve the same data, and this applies to many other similar views in this particular case.
I don't think it's anything to do with creating a connection to the other db, unless by using a view it somehow can never cache the connection whereas the select does, because I can switch between the 2 selects in the same SSMS window repeatedly and the performance of each query remains consistent. Also, if I connect directly to db "B" and run the select without the dbname.dbo.... refs, it takes the same time.
Any thoughts anyone?
Views:
We can create index on views (not possible in stored proc)
it's easy to give abstract views(only limited column access of multiple table ) of
table data to other DBA/users
Store Procedure:
We can pass parameters to sp(not possible in views)
Execute multiple statement inside procedure (like insert, update,delete operations)
A couple other considerations: While performance between an SP and a view are essentially the same (given they are performing the exact same select), the SP gives you more flexibility for that same query.
The SP will support ordering the result set; i.e., including an ORDER BY statement. You cannot do so in a view.
The SP is fully compiled and requires only an exec to invoke it. The view still requires a SELECT * FROM view to invoke it; i.e., a select on the compiled select in the view.
Found a detailed performance analysis: https://www.scarydba.com/2016/11/01/stored-procedures-not-faster-views/
Compile Time Comparison:
There is a difference in the compile time between the view by itself and the stored procedures (they were almost identical). Let’s look at performance over a few thousand executions:
View AVG: 210.431431431431
Stored Proc w/ View AVG: 190.641641641642
Stored Proc AVG: 200.171171171171
This is measured in microsends, so the variation we’re seeing is likely just some disparity on I/O, CPU or something else since the differences are trivial at 10mc or 5%.
What about execution time including compile time, since there is a
difference:
Query duration View AVG: 10089.3226452906
Stored Proc AVG: 9314.38877755511
Stored Proc w/ View AVG: 9938.05410821643
Conclusion:
With the exception of the differences in compile time, we see that views actually perform exactly the same as stored procedures, if the query in question is the same.
I have a table with almost 800,000 records and I am currently using dynamic sql to generate the query on the back end. The front end is a search page which takes about 20 parameters and depending on if a parameter was chosen, it adds an " AND ..." to the base query. I'm curious as to if dynamic sql is the right way to go ( doesn't seem like it because it runs slow). I am contemplating on just creating a denormalized table with all my data. Is this a good idea or should I just build the query all together instead of building it piece by piece using the dynamic sql. Last thing, is there a way to speed up dynamic sql?
It is more likely that your indexing (or lack thereof) is causing the slowness than the dynamic SQL.
What does the execution plan look like? Is the same query slow when executed in SSMS? What about when it's in a stored procedure?
If your table is an unindexed heap, it will perform poorly as the number of records grows - this is regardless of the query, and a dynamic query can actually perform better as the table nature changes because a dynamic query is more likely to have its query plan re-evaluated when it's not in the cache. This is not normally an issue (and I would not classify it as a design advantage of dynamic queries) except in the early stages of a system when SPs have not been recompiled, but statistics and query plans are out of date, but the volume of data has just drastically changed.
Not the static one yet. I have with the dynamic query, but it does not give any optimizations. If I ran it with the static query and it gave suggestions, would applying them affect the dynamic query? – Xaisoft (41 mins ago)
Yes, the dynamic query (EXEC (#sql)) is probably not going to be analyzed unless you analyzed a workload file. – Cade Roux (33 mins ago)
When you have a search query across multiple tables that are joined, the columns with indexes need to be the search columns as well as the primary key/foreign key columns - but it depends on the cardinality of the various tables. The tuning analyzer should show this. – Cade Roux (22 mins ago)
I'd just like to point out that if you use this style of optional parameters:
AND (#EarliestDate is Null OR PublishedDate < #EarliestDate)
The query optimizer will have no idea whether the parameter is there or not when it produces the query plan. I have seen cases where the optimizer makes bad choices in these cases. A better solution is to build the sql that uses only the parameters you need. The optimizer will make the most efficient execution plan in these cases. Be sure to use parameterized queries so that they are reusable in the plan cache.
As previous answer, check your indexes and plan.
The question is whether you are using a stored procedure. It's not obvious from the way you worded it. A stored procedure creates a query plan when run, and keeps that plan until recompiled. With varying SQL, you may be stuck with a bad query plan. You could do several things:
1) Add WITH RECOMPILE to the SP definition, which will cause a new plan to be generated with every execution. This includes some overhead, which may be acceptable.
2) Use separate SP's, depending on the parameters provided. This will allow better query plan caching
3) Use client generated SQL. This will create a query plan each time. If you use parameterized queries, this may allow you to use cached query plans.
The only difference between "dynamic" and "static" SQL is the parsing/optimization phase. Once those are done, the query will run identically.
For simple queries, this parsing phase plus the network traffic turns out to be a significant percentage of the total transaction time, so it's good practice to try and reduce these times.
But for large, complicated queries, this processing is overall insignificant compared to the actual path chosen by the optimizer.
I would focus on optimizing the query itself, including perhaps denormalization if you feel that it's appropriate, though I wouldn't do that on a first go around myself.
Sometimes the denormalization can be done at "run time" in the application using cached lookup tables, for example, rather than maintaining this o the database.
Not a fan of dynamic Sql but if you are stuck with it, you should probably read this article:
http://www.sommarskog.se/dynamic_sql.html
He really goes in depth on the best ways to use dynamic SQL and the isues using it can create.
As others have said, indexing is the most likely culprit. In indexing, one thing people often forget to do is put an index on the FK fields. Since a PK creates an index automatically, many assume an FK will as well. Unfortunately creating an FK does nto create an index. So make sure that any fields you join on are indexed.
There may be better ways to create your dynamic SQL but without seeing the code it is hard to say. I would at least look to see if it is using subqueries and replace them with derived table joins instead. Also any dynamic SQl that uses a cursor is bound to be slow.
If the parameters are optional, a trick that's often used is to create a procedure like this:
CREATE PROCEDURE GetArticlesByAuthor (
#AuthorId int,
#EarliestDate datetime = Null )
AS
SELECT * --not in production code!
FROM Articles
WHERE AuthorId = #AuthorId
AND (#EarliestDate is Null OR PublishedDate < #EarliestDate)
There are some good examples of queries with optional search criteria here: How do I create a stored procedure that will optionally search columns?
As noted, if you are doing a massive query, Indexes are the first bottleneck to look at. Make sure that heavily queried columns are indexed. Also, make sure that your query checks all indexed parameters before it checks un-indexed parameters. This makes sure that the results are filtered down using indexes first and then does the slow linear search only if it has to. So if col2 is indexed but col1 is not, it should look as follows:
WHERE col2 = #col2 AND col1 = #col1
You may be tempted to go overboard with indexes as well, but keep in mind that too many indexes can cause slow writes and massive disk usage, so don't go too too crazy.
I avoid dynamic queries if I can for two reasons. One, they do not save the query plan, so the statement gets compiled each time. The other is that they are hard to manipulate, test, and troubleshoot. (They just look ugly).
I like Dave Kemp's answer above.
I've had some success (in a limited number of instances) with the following logic:
CREATE PROCEDURE GetArticlesByAuthor (
#AuthorId int,
#EarliestDate datetime = Null
) AS
SELECT SomeColumn
FROM Articles
WHERE AuthorId = #AuthorId
AND #EarliestDate is Null
UNION
SELECT SomeColumn
FROM Articles
WHERE AuthorId = #AuthorId
AND PublishedDate < #EarliestDate
If you are trying to optimize to below the 1s range, it may be important to gauge approximately how long it takes to parse and compile the dynamic sql relative to the actual query execution time:
SET STATISTICS TIME ON;
and then execute the dynamic SQL string "statically" and check the "Messages" tab. I was surprised by these results for a ~10 line dynamic sql query that returns two rows from a 1M row table:
SQL Server parse and compile time:
CPU time = 199 ms, elapsed time = 199 ms.
(2 row(s) affected)
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 4 ms.
Index optimization will doubtfully move the 199ms barrier much (except perhaps due to some analyzation/optimization included within the compile time).
However if the dynamic SQL uses parameters or is repeating than the compile results may be cached according to: See Caching Query Plans which would eliminate the compile time. Would be interesting to know how long cache entries live, size, shared between sessions, etc.