Best Practice: One Stored Proc that always returns all fields or different stored procedure for each field set needed? - sql

If I have a table with Field 1, Field 2, Field 3, Field 4 and for one instance need just Field 1 and Field 2, but another I need Field 3 and Field 4 and yet another need all of them...
Is it better to have a SP for each combination I need or one SP that always returns them all?

Very important question:
Writing many stored procs that run the same query will make you spend a lot of time documenting and apologising to future maintainers.
For every time anyone wants to introduce a change, they have to consider whether it should apply to all stored procs, or to some or to one only...
I would do only one stored proc.

I would just have one Stored Procedure as it will be easier to maintain.
Does it need to be a Stored Procedure? You could rewrite it as a View then simply select the columns that you need.

If network bandwidth and memory usage is more important than hours of work and project simplicity, then make a separate SP for each task. Otherwise there's no point. (the gains aren't that great, and are noticeable only when the rowset is extremely large, or there are a lot of simultaneous requests)

As a general rule it is good practice to select only the columns we need to serve a particular purpose. This is particularly true for tables which have:
lots of columns
LOB columns
sensitive or restricted data
However, if we have a complicated system with lots of tables it is obviously impractical to build a separate stored procedure for each distinct query. In fact it is probably undesirable to do so. The resultant API would be overwhelming to use and a lot of effort to maintain.
The solutions are several and various, and really depend on the nature of the applications. Views can help, although they share some of the same maintenance issues. Dynamic SQL is another approach. We can write complicated procedures which return many differnet result sets depending on the input parameters. Heck, sometimes we can even write SQL statements in the actual application.
Oh, and there is the simple procedure which basically wraps a SELECT * FROM some_table but that comes with its own suite of problems.

Related

Stored procedure / query needing additional field in 2% of all use-cases. Best practises?

I have a stored procedure used in aound 50 different places within a large application. One of these 50 contexts would require an additional field from this procedure.
I can think of two different approaches:
Update the stored procedure to return the missing field.
That means that I will always request some additional data that will be useless in the 49 of 50 other use-cases.
Let the stored procedure as it is (withouth the needed additional data).
And when I need the additional data (1 of 50 use-cases), query the base after the SP returns to enhance the SP result set into a dedicated DTO.
In this case I would add this extra code into my business layer.
That means extra code and getting all the data in two queries instead of one.
What would be the best practices?
"Best practice" in my opinion is not a hugely useful phrase - it depends on what you're trying to optimize for.
If your application code can handle the additional field, there are many arguments for just adding that field to the result set. It's the least effort, and the least complexity. There are no surprises - it's common for stored procedures to return a result set, and the application code to then decide what to do with that result set (including ignoring some columns). As a general "best practice" principle, having a single stored procedure return a full result set, and having the client code decide what to do with that result set, makes the proc easy to use and re-use, and as long as you don't change the result set, you can change the implementation of the procedure. The overhead of transferring a few extra bytes is negligible in most cases - if it is a problem, you have bigger problems!
EDIT in response to comment from #vvgiri:
There may be a performance impact if you have to join to other tables to get the additional data. In practice, as long as the join is a simple "foreign key/primary key" join, it will likely have no measurable impact unless you have huge data sets, but it's worth testing.
The problem with that approach is that you may need to test all 50 places where the stored proc is used. You don't write about this in your question; if it's not an issue, I would just add the extra field.
If it is a problem, I'd consider the maintainability and "least surprise" factors. Without specifics, it's hard to say whether I'd be surprised if client code calls one stored procedure, and then uses the result of that to get more data. But if that would be surprising, I'd consider writing a new procedure, with a name that makes clear why it exists. That procedure should probably call the original proc, and then do the enriching, returning the result set your client code needs.
This approach means your client code stays "clean" - no surprises or special cases. It re-uses the original proc - no duplication. It may be slower than modifying the original proc, but no slower than your "option 2".
I think personally that I'd just include a variable in my procedure that I can set on my execution query if I want the extra columns, using this variable to adjust the output as needed

SQL Server - Select * vs Select Column in a Stored Procedure

In an ad-hoc query using Select ColumnName is better, but does it matter in a Stored Procedure after it's saved in the plan guide?
Always explicitly state the columns, even in a stored procedure. SELECT * is considered bad practice.
For instance you don't know the column order that will be returned, some applications may be relying on a specific column order.
I.e. the application code may look something like:
Id = Column[0]; // bad design
If you've used SELECT * ID may no longer be the first column and cause the application to crash. Also, if the database is modified and an additional 5 fields have been added you are returning additional fields that may not be relevant.
These topics always elicit blanket statements like ALWAYS do this or NEVER do that, but the reality is, like with most things it depends on the situation. I'll concede that it's typically good practice to list out columns, but whether or not it's bad practice to use SELECT * depends on the situation.
Consider a variety of tables that all have a common field or two, for example we have a number of tables that have different layouts, but they all have 'access_dt' and 'host_ip'. These tables aren't typically used together, but there are instances when suspicious activity prompts a full report of all activity. These aren't common, and they are manually reviewed, as such, they are well served by a stored procedure that generates a report by looping through every log table and using SELECT * leveraging the common fields between all tables.
It would be a waste of time to list out fields in this situation.
Again, I agree that it's typically good practice to list out fields, but it's not always bad practice to use SELECT *.
Edit: Tried to clarify example a bit.
It's a best practice in general but if you actually do need all the column, you'd better use the quickly read "SELECT *".
The important thing is to avoid retreiving data you don't need.
It is considered bad practice in situations like stored procedures when you are querying large datasets with table scans. You want to avoid using table scans because it causes a hit to the performance of the query. It's also a matter of readability.
SOme other food for thought. If your query has any joins at all you are returning data you don't need because the data in the join columns is the same. Further if the table is later changed to add some things you don't need (such as columns for audit purposes) you may be returning data to the user that they should not be seeing.
Nobody has mentioned the case when you need ALL columns from a table, even if the columns change, e.g. when archiving table rows as XML. I agree one should not use "SELECT *" as a replacement for "I need all the columns that currently exist in the table," just out of laziness or for readability. There needs to be a valid reason. It could be essential when one needs "all the columns that could exist in the table."
Also, how about when creating "wrapper" views for tables?

SQL Server execute several commands simultaneously

I have a stored proc that I need to execute 100 times (one for each parameter). I was wondering if I could execute these all at the same time using batch or something similar so that it would speed up processing instead of executing one and then the next.
Thanks!
Can you rewrite your procedure to accept TABLE parameter, fill it with 100 values, and process table instead of 100 scalars?
To directly answer your question, you could open 100 separate connections and execute 100 separate queries concurrently.
But as has been mentioned, I don't think this is the solution for you.
As you have a table with the 100 values, it seems you have a few options...
Change the StoredProcedure to a View, and join on the view.
Change the StoredProcedure to a Table Valued Function, and use CROSS APPLY.
(Inline functions will perform a lot better than Multi-Statement functions.)
These two are limitted by the fact that neither a view nor a function can have any side-effects... No writing to tables, etc, etc.
If you can't refactor your code to use a view or function, you still need a stored procedure to encapsulate the code.
In that case, you could either:
- pass the table of values in as a Table Valued Parameter.
- or simply have the Stored Procedure read from the table directly.
Depending on your needs, you may even wish to create a table specifically for this SP to read from. This introduces a couple of extra issues though...
- Concurrency : How to keep my data separate from someone elses? Have a field to hold a unique identifier, such as the session's ##SPID.
- Clean Up : You don't want processes inserting data all day, but never deleting it.
The one thing I would strongly suggest you avoid is using loops/cursors. If you can find a set based approach, use it :)
EDIT
A comment you just left mentions that you have millions of records to process.
This makes using set based approaches much more preferable. You may, however, find that this creates extremely large transactions (if you're doing a lot of INSERTs, UPDATEs, etc). In which case, still find a set based approach, then find a way of doing this in smaller pieces (say split by day if the data is time related, or just 1000 records at a time, whatever fits.)

Avoiding duplicating SQL code?

I know many ways to avoid duplicating PHP code (in my case PHP). However, I am developing a rather big application that does some calculations on the database with the data it finds, and I have noticed the need to use the same code (parts of SQL) in other places.
I don't like the idea of copying and pasting the same thing over and over again. What is a good way to do this? Should I use stored procedures? I could almost calculate some of the stuff in PHP except that most of the times the queries are calculating values based on also data not returned by the query and it seems stupid to return extra data to PHP so that it could its calculations. Sometimes that may be okay, but now it does not feel so.
What should I do?
For example, all over in many SQL queries I am calculating similar to this:
...
(SELECT SUM(amount) FROM IT INNER JOIN Invoice I WHERE IT.invoiceId=I.id) AS total
...
FROM InvoiceTransaction IT
...
Note that I'm at home now so I'm writing this off the top of my head.
I think you have 2 solutions:
if the SQL returns a small amount of data, I would simply wrap the SQL invocation in a method call and call it (parameterising as necessary)
if the SQL handles a lot of data, I would keep that data in the database and use a stored procedure. You can then call that stored procedure without duplicating the code (but wrap the stored proc call in a function and call it - i.e. as in option 1)
I wouldn't necessarily shy away from stored procedures. But I would advise keeping business logic out of them (keep it in the application itself) and make sure you have sufficient unit testing around it.
I do not prefer store procedure, especially not for the sake of refactoring. You should consider writing a function that return the record you need, and put your SQL queries in that function so you can call it instead of putting your SQL everywhere.
I think we would need an example of a query. Stored procs might be a good option. Or an alternative might be to use views. One advantage in having your queries in views or stored procs is that you can often use the database to see where your tables are used. Disadvantage is that you are locking yourself into one database, however you are probably doing this anyway.

SQL-Server Performance: What is faster, a stored procedure or a view?

What is faster in SQL Server 2005/2008, a Stored Procedure or a View?
EDIT:
As many of you pointed out, I am being too vague. Let me attempt to be a little more specific.
I wanted to know the performance difference for a particular query in a View, versus the exact same query inside a stored procedure.
(I still appreciate all of the answers that point out their different capabilities)
Stored Procedures (SPs) and SQL Views are different "beasts" as stated several times in this post.
If we exclude some [typically minor, except for fringe cases] performance considerations associated with the caching of the query plan, the time associated with binding to a Stored Procedure and such, the two approaches are on the whole equivalent, performance-wise. However...
A view is limited to whatever can be expressed in a single SELECT statement (well, possibly with CTEs and a few other tricks), but in general, a view is tied to declarative forms of queries. A stored procedure on the other can use various procedural type constructs (as well as declarative ones), and as a result, using SPs, one can hand-craft a way of solving a given query which may be more efficient than what SQL-Server's query optimizer may have done (on the basis of a single declarative query). In these cases, an SPs may be much faster (but beware... the optimizer is quite smart, and it doesn't take much to make an SP much slower than the equivalent view.)
Aside from these performance considerations, the SPs are more versatile and allow a broader range of inquiries and actions than the views.
Unfortunately, they're not the same type of beast.
A stored procedure is a set of T-SQL statements, and CAN return data. It can perform all kinds of logic, and doesn't necessarily return data in a resultset.
A view is a representation of data. It's mostly used as an abstraction of one or more tables with underlying joins. It's always a resultset of zero, one or many rows.
I suspect your question is more along the lines of:
Which is faster: SELECTing from a view, or the equivalent SELECT statement in a stored procedure, given the same base tables performing the joins with the same where clauses?
This isn't really an answerable question in that an answer will hold true in all cases. However, as a general answer for an SQL Server specific implementaion...
In general, a Stored Procedure stands a good chance of being faster than a direct SQL statement because the server does all sorts of optimizations when a stored procedure is saves and executed the first time.
A view is essentially a saved SQL statement.
Therefore, I would say that in general, a stored procedure will be likely to be faster than a view IF the SQL statement for each is the same, and IF the SQL statement can benefit from optimizations. Otherwise, in general, they would be similar in performance.
Reference these links documentation supporting my answer.
http://www.sql-server-performance.com/tips/stored_procedures_p1.aspx
http://msdn.microsoft.com/en-us/library/ms998577.aspx
Also, if you're looking for all the ways to optimize performance on SQL Server, the second link above is a good place to start.
In short, based on my experience in some complex queries, Stored procedure gives better performance than function.
But you cannot use results of stored procedure in select or join queries.
If you don't want to use the result set in another query, better to use SP.
And rest of the details and differences are mentioned by people in this forum and elsewhere.
I prefer stored procedures due to Allow greater control over data, if you want to build a good, secure modular system then use stored procedures, it can run multiple sql-commands, has control-of-flow statements and accepts parameters. Everything you can do in a view you can do in a stored procedure. But in a stored procedure, you can do with much more flexibility.
I believe that another way of thinking would be to use stored procedures to select the views. This will make your architecture a loosely coupled system. If you decide to change the schema in the future, you won't have to worry 'so' much that it will break the front end.
I guess what I'm saying is instead of sp vs views, think sp and views :)
Stored procedures and views are different and have different purposes. I look at views as canned queries. I look at stored procedures as code modules.
For example let's say you have a table called tblEmployees with these two columns (among others): DateOfBirth and MaleFemale.
A view called viewEmployeesMale which filters out only male employees can be very useful. A view called viewEmployeesFemale is also very useful. Both of these views are self describing and very intuitive.
Now, lets say you need to produce a list all male employees between the ages of 25 and 30. I would tend to create a stored procedure to produce this result. While it most certainly could be built as a view, in my opinion a stored procedure is better suited for dealing with this. Date manipulation especially where nulls are a factor can become very tricky.
I know I'm not supposed to turn this into a "discussion", but I'm very interested in this and just thought I'd share my empirical observations of a specific situation, with particular reference to all the comments above which state that an equivalent SELECT statement executed from within a Stored Procedure and a View should have broadly the same performance.
I have a view in database "A" which joins 5 tables in a separate database (db "B"). If I attach to db "A" in SSMS and SELECT * from the view, it takes >3 minutes to return 250000 rows. If I take the select statement from the design page of the view and execute it directly in SSMS, it takes < 25 seconds. Putting the same select statement into a stored procedure gives the same performance when I execute that procedure.
Without making any observations on the absolute performance (db "B" is an AX database which we are not allowed to touch!), I am still absolutely convinced that in this case using an SP is an order of magnitude faster than using a View to retrieve the same data, and this applies to many other similar views in this particular case.
I don't think it's anything to do with creating a connection to the other db, unless by using a view it somehow can never cache the connection whereas the select does, because I can switch between the 2 selects in the same SSMS window repeatedly and the performance of each query remains consistent. Also, if I connect directly to db "B" and run the select without the dbname.dbo.... refs, it takes the same time.
Any thoughts anyone?
Views:
We can create index on views (not possible in stored proc)
it's easy to give abstract views(only limited column access of multiple table ) of
table data to other DBA/users
Store Procedure:
We can pass parameters to sp(not possible in views)
Execute multiple statement inside procedure (like insert, update,delete operations)
A couple other considerations: While performance between an SP and a view are essentially the same (given they are performing the exact same select), the SP gives you more flexibility for that same query.
The SP will support ordering the result set; i.e., including an ORDER BY statement. You cannot do so in a view.
The SP is fully compiled and requires only an exec to invoke it. The view still requires a SELECT * FROM view to invoke it; i.e., a select on the compiled select in the view.
Found a detailed performance analysis: https://www.scarydba.com/2016/11/01/stored-procedures-not-faster-views/
Compile Time Comparison:
There is a difference in the compile time between the view by itself and the stored procedures (they were almost identical). Let’s look at performance over a few thousand executions:
View AVG: 210.431431431431
Stored Proc w/ View AVG: 190.641641641642
Stored Proc AVG: 200.171171171171
This is measured in microsends, so the variation we’re seeing is likely just some disparity on I/O, CPU or something else since the differences are trivial at 10mc or 5%.
What about execution time including compile time, since there is a
difference:
Query duration View AVG: 10089.3226452906
Stored Proc AVG: 9314.38877755511
Stored Proc w/ View AVG: 9938.05410821643
Conclusion:
With the exception of the differences in compile time, we see that views actually perform exactly the same as stored procedures, if the query in question is the same.