so I have written several stored procedures that act on individual rows of data by taking in an ID number. I would like to keep several stored procedures that can call this stored procedure at different levels of my database scheme. For instance, when a row is inserted I call this stored procedure. When something else is modified I would like to call this stored procedure for each line. This is so I can have one set of base code that can be called everywhere else but that acts on different amounts of data. I have been able to produce this result with Cursors, but I am told these are very inefficient. Is there any other way to produce this kind of functionality without sacrificing performance? Thanks.
Yes. Use standard joins to operate on sets rather than RBAR (Row By Agonising Row). i.e. Rather than call a function for each row, design a join that performs the required operation on every applicable row as a set operation.
I often see devs use the 'function operates on a each row', and although this seems to be the obvious way to encapsulate logic, it doesn't perform well on SQL Server or most DB engines.
In some circumstances, a table-valued function can be used effectively (MS SQL Server).
(BTW, you are correct in saying cursors are inefficient).
Related
I have a stored procedure to write where I need only two attributes out of a field. Most likely, it will only be working on one, maybe two or three fields at a time. It's also supposed to be a CLR stored procedure.
As a beginner in SQL, it seems to me that passing a table value parameter is overkill, however since this is for work, and a relatively big company, I'm wondering if there are other considerations to make? What exactly is the value in using a table value parameter as opposed to a normal one.
I don't see the need for CLR outlined in your question, however there are definitely uses for TVPs.
I have seen comma separated lists being passed into SQL stored procedures, TVPs could be utilized there for better type checking and structure.
It also helps with enterprise level robustness; I worked on something that would attempt 1.5 million inserts within ~1 hour. Doing this with individual inserts (a proc that accepted a bunch of scalar parameters) was resource intensive and very, very slow; when converted to perform bulk inserts with Table Valued Parameters the operation completed in about 1/5 the time.
I know many ways to avoid duplicating PHP code (in my case PHP). However, I am developing a rather big application that does some calculations on the database with the data it finds, and I have noticed the need to use the same code (parts of SQL) in other places.
I don't like the idea of copying and pasting the same thing over and over again. What is a good way to do this? Should I use stored procedures? I could almost calculate some of the stuff in PHP except that most of the times the queries are calculating values based on also data not returned by the query and it seems stupid to return extra data to PHP so that it could its calculations. Sometimes that may be okay, but now it does not feel so.
What should I do?
For example, all over in many SQL queries I am calculating similar to this:
...
(SELECT SUM(amount) FROM IT INNER JOIN Invoice I WHERE IT.invoiceId=I.id) AS total
...
FROM InvoiceTransaction IT
...
Note that I'm at home now so I'm writing this off the top of my head.
I think you have 2 solutions:
if the SQL returns a small amount of data, I would simply wrap the SQL invocation in a method call and call it (parameterising as necessary)
if the SQL handles a lot of data, I would keep that data in the database and use a stored procedure. You can then call that stored procedure without duplicating the code (but wrap the stored proc call in a function and call it - i.e. as in option 1)
I wouldn't necessarily shy away from stored procedures. But I would advise keeping business logic out of them (keep it in the application itself) and make sure you have sufficient unit testing around it.
I do not prefer store procedure, especially not for the sake of refactoring. You should consider writing a function that return the record you need, and put your SQL queries in that function so you can call it instead of putting your SQL everywhere.
I think we would need an example of a query. Stored procs might be a good option. Or an alternative might be to use views. One advantage in having your queries in views or stored procs is that you can often use the database to see where your tables are used. Disadvantage is that you are locking yourself into one database, however you are probably doing this anyway.
From the MSDN docs for create function:
User-defined functions cannot be used to perform actions that modify the database state.
My question is simply - why?
Yes, a UDF that modifies data may have potentially unwanted side-effects.
Yes, there is overhead involved if a UDF is called thousands of times.
But that is the whole point of design and testing - to ensure that such issues are ironed out before deployment. So why do DB vendors insist on imposing these artificial limitations on developers? What is the point of a language construct that can essentially only be used as a wrapper for select statements?
The reason for this question is as follows: I am writing a function to return a GUID for a certain unique integer ID. If a GUID is already allocated for that ID I simply return it; otherwise I want to generate a new GUID, store that into a table, and return the newly-generated GUID. (Yes, this sounds long-winded and possibly crazy, but when you're sending data to another dev company who believes their design was handed down by God and cannot be improved upon, it's easier just to smile and nod and do what they ask).
I know that I can use a stored procedure with an output parameter to achieve the same result, but then I have to declare a new variable just to hold the result of the sproc. Not only that, I then have to convert my simple select into a while loop that inserts into a temporary table, and call the sproc for every iteration of that loop.
It's usually best to think of the available tools as a spectrum, from Views, through UDFs, out to Stored Procedures. At the one end (Views) you have a lot of restrictions, but this means the optimizer can actually "see through" the code and make intelligent choices. At the other end (Stored Procedures), you've got lots of flexibility, but because you have such freedom, you lose some abilities (e.g. because you can return multiple result sets from a stored proc, you lose the ability to "compose" it as part of a larger query).
UDFs sit in a middle ground - you can do more than you can do in a view (multiple statements, for example), but you don't have as much flexibility as a stored proc. By giving up this freedom, it allows the outputs to be composed as part of a larger query. By not having side effects, you guarantee that, for example, it doesn't matter in which row order the UDF is applied in. If you could have side effects, the optimizer might have to give an ordering guarantee.
I understand your issue, I think, but taking this from your comment:
I want to do something like select my_udf(my_variable) from my_table, where my_udf either selects or creates the value it returns
So you want a select that (potentially) modifies data. Can you look at that sentence on its own and tell me that that reads perfectly OK? - I certainly can't.
Reading your description of what you actually need to do:
I am writing a function to return a
GUID for a certain unique integer ID.
If a GUID is already allocated for
that ID I simply return it; otherwise
I want to generate a new GUID, store
that into a table, and return the
newly-generated GUID.
I know that I can use a stored
procedure with an output parameter to
achieve the same result, but then I
have to declare a new variable just to
hold the result of the sproc. Not only
that, I then have to convert my simple
select into a while loop that inserts
into a temporary table, and call the
sproc for every iteration of that
loop.
from that last sentence it sounds like you have to process many rows at once, so how about a single INSERT that inserts the GUIDs for those IDs that don't already have them, followed by a single SELECT that returns all the GUIDs that (now) exist?
Sometimes if you cannot implement the solution you came up with, it may be an indication that your solution is not optimal.
Using a statement like this
INSERT INTO IntGuids(IntValue, GuidValue)
SELECT MyIntValues.IntValue, NEWID()
FROM MyIntValues
LEFT OUTER JOIN IntGuids ON MyIntValues.IntValue = IntGuids.IntValue
WHERE IntGuids.IntValue IS NULL
creates all the GUIDs you need to have in 1 statement. No need to SELECT+INSERT for every single value.
What is faster in SQL Server 2005/2008, a Stored Procedure or a View?
EDIT:
As many of you pointed out, I am being too vague. Let me attempt to be a little more specific.
I wanted to know the performance difference for a particular query in a View, versus the exact same query inside a stored procedure.
(I still appreciate all of the answers that point out their different capabilities)
Stored Procedures (SPs) and SQL Views are different "beasts" as stated several times in this post.
If we exclude some [typically minor, except for fringe cases] performance considerations associated with the caching of the query plan, the time associated with binding to a Stored Procedure and such, the two approaches are on the whole equivalent, performance-wise. However...
A view is limited to whatever can be expressed in a single SELECT statement (well, possibly with CTEs and a few other tricks), but in general, a view is tied to declarative forms of queries. A stored procedure on the other can use various procedural type constructs (as well as declarative ones), and as a result, using SPs, one can hand-craft a way of solving a given query which may be more efficient than what SQL-Server's query optimizer may have done (on the basis of a single declarative query). In these cases, an SPs may be much faster (but beware... the optimizer is quite smart, and it doesn't take much to make an SP much slower than the equivalent view.)
Aside from these performance considerations, the SPs are more versatile and allow a broader range of inquiries and actions than the views.
Unfortunately, they're not the same type of beast.
A stored procedure is a set of T-SQL statements, and CAN return data. It can perform all kinds of logic, and doesn't necessarily return data in a resultset.
A view is a representation of data. It's mostly used as an abstraction of one or more tables with underlying joins. It's always a resultset of zero, one or many rows.
I suspect your question is more along the lines of:
Which is faster: SELECTing from a view, or the equivalent SELECT statement in a stored procedure, given the same base tables performing the joins with the same where clauses?
This isn't really an answerable question in that an answer will hold true in all cases. However, as a general answer for an SQL Server specific implementaion...
In general, a Stored Procedure stands a good chance of being faster than a direct SQL statement because the server does all sorts of optimizations when a stored procedure is saves and executed the first time.
A view is essentially a saved SQL statement.
Therefore, I would say that in general, a stored procedure will be likely to be faster than a view IF the SQL statement for each is the same, and IF the SQL statement can benefit from optimizations. Otherwise, in general, they would be similar in performance.
Reference these links documentation supporting my answer.
http://www.sql-server-performance.com/tips/stored_procedures_p1.aspx
http://msdn.microsoft.com/en-us/library/ms998577.aspx
Also, if you're looking for all the ways to optimize performance on SQL Server, the second link above is a good place to start.
In short, based on my experience in some complex queries, Stored procedure gives better performance than function.
But you cannot use results of stored procedure in select or join queries.
If you don't want to use the result set in another query, better to use SP.
And rest of the details and differences are mentioned by people in this forum and elsewhere.
I prefer stored procedures due to Allow greater control over data, if you want to build a good, secure modular system then use stored procedures, it can run multiple sql-commands, has control-of-flow statements and accepts parameters. Everything you can do in a view you can do in a stored procedure. But in a stored procedure, you can do with much more flexibility.
I believe that another way of thinking would be to use stored procedures to select the views. This will make your architecture a loosely coupled system. If you decide to change the schema in the future, you won't have to worry 'so' much that it will break the front end.
I guess what I'm saying is instead of sp vs views, think sp and views :)
Stored procedures and views are different and have different purposes. I look at views as canned queries. I look at stored procedures as code modules.
For example let's say you have a table called tblEmployees with these two columns (among others): DateOfBirth and MaleFemale.
A view called viewEmployeesMale which filters out only male employees can be very useful. A view called viewEmployeesFemale is also very useful. Both of these views are self describing and very intuitive.
Now, lets say you need to produce a list all male employees between the ages of 25 and 30. I would tend to create a stored procedure to produce this result. While it most certainly could be built as a view, in my opinion a stored procedure is better suited for dealing with this. Date manipulation especially where nulls are a factor can become very tricky.
I know I'm not supposed to turn this into a "discussion", but I'm very interested in this and just thought I'd share my empirical observations of a specific situation, with particular reference to all the comments above which state that an equivalent SELECT statement executed from within a Stored Procedure and a View should have broadly the same performance.
I have a view in database "A" which joins 5 tables in a separate database (db "B"). If I attach to db "A" in SSMS and SELECT * from the view, it takes >3 minutes to return 250000 rows. If I take the select statement from the design page of the view and execute it directly in SSMS, it takes < 25 seconds. Putting the same select statement into a stored procedure gives the same performance when I execute that procedure.
Without making any observations on the absolute performance (db "B" is an AX database which we are not allowed to touch!), I am still absolutely convinced that in this case using an SP is an order of magnitude faster than using a View to retrieve the same data, and this applies to many other similar views in this particular case.
I don't think it's anything to do with creating a connection to the other db, unless by using a view it somehow can never cache the connection whereas the select does, because I can switch between the 2 selects in the same SSMS window repeatedly and the performance of each query remains consistent. Also, if I connect directly to db "B" and run the select without the dbname.dbo.... refs, it takes the same time.
Any thoughts anyone?
Views:
We can create index on views (not possible in stored proc)
it's easy to give abstract views(only limited column access of multiple table ) of
table data to other DBA/users
Store Procedure:
We can pass parameters to sp(not possible in views)
Execute multiple statement inside procedure (like insert, update,delete operations)
A couple other considerations: While performance between an SP and a view are essentially the same (given they are performing the exact same select), the SP gives you more flexibility for that same query.
The SP will support ordering the result set; i.e., including an ORDER BY statement. You cannot do so in a view.
The SP is fully compiled and requires only an exec to invoke it. The view still requires a SELECT * FROM view to invoke it; i.e., a select on the compiled select in the view.
Found a detailed performance analysis: https://www.scarydba.com/2016/11/01/stored-procedures-not-faster-views/
Compile Time Comparison:
There is a difference in the compile time between the view by itself and the stored procedures (they were almost identical). Let’s look at performance over a few thousand executions:
View AVG: 210.431431431431
Stored Proc w/ View AVG: 190.641641641642
Stored Proc AVG: 200.171171171171
This is measured in microsends, so the variation we’re seeing is likely just some disparity on I/O, CPU or something else since the differences are trivial at 10mc or 5%.
What about execution time including compile time, since there is a
difference:
Query duration View AVG: 10089.3226452906
Stored Proc AVG: 9314.38877755511
Stored Proc w/ View AVG: 9938.05410821643
Conclusion:
With the exception of the differences in compile time, we see that views actually perform exactly the same as stored procedures, if the query in question is the same.
I want to iterate through a table/view and then kick off some process (e.g. run a job, send an email) based on some criteria.
My arbitrary constraint here is that I want to do this inside the database itself, using T-SQL on a stored proc, trigger, etc.
Does this scenario require cursors, or is there some other native T-SQL row-based operation I can take advantage of?
Your best bet is a cursor. SQL being declarative and set based, any 'workaround you may find that tries to force SQL to do imperative row oriented operations is unreliable and may break. Eg. the optimizer may cut out your 'operation' from the execution, or do it in strange order or for an unexpected number of times.
The general bad name cursors get is when they are deployed instead of set based operations (like do a computation and update, or return a report) because the developer did not found a set oriented way of doing the same functionality. But for non-SQL operations (ie. launch a process) they are appropriate.
You may also use some variations on the cursor theme, like client side iterating through a result set. That is very similar in spirit to a cursor, although not using explicit cursors.
The standard way to do this would be SSIS. Just use an Execute SQL task to get the rows, and a For Each task container to iterate once per row. Inside the container, run whatever tasks you like, and they'll have access to the columns of each row.
If you are planning on sending an email to each record with an email address (or a similar row-based operation) then you would indeed plan on using a cursor.
There is no other "row-based" operation that you'd do within SQL itself (although I like John's suggestion to investigate SSIS - as long as you have SQL Server Standard or Enterprise). However, if you are summing, searching or any other kind of operation and then kicking off an event once done the entire selection set, then you would certainly not use a cursor. Just so you know - cursors are generally considered a "last resort" approach to problems in SQL Server.
The first thought which comes to my mind when I need to iterate over the result set of a query is to use cursors. Yes, it is a quick and dirty way of programming. But cursors have their setbacks as well - They incur overheads and can be performance bottle necks.
There are alternatives to using cursors. You can try using a temp table with an identity column. Copy you table to the temp table and using a while loop to iterate over the rows. Then based on a condition call your stored procedure.
Here, check this link for alternatives to cursors - http://searchsqlserver.techtarget.com/tip/0,289483,sid87_gci1339242,00.html
cheers