Code reuse and modularity in SQL - sql

Is code reuse and modulatiry a good idea for SQL Stored Procedures programming?
And if so, what's the best way to add these features to a SQL stored procedure code base?
I usually create scalar valued functions for tasks that are common and repeated. I find that it eases both development of new procedures similar to existing ones, but also aids a lot in bugtracking and troubleshooting.
I try to stay away from table valued functions though, due to performance issues.
My rule of thumb is that if it is a calculation, and it's used in several places, then I create a scalar valued function.

You are going to find that using functions within your queries is a disaster for performance. The functions become a black box for the optimizer, so you will end up re-coding the function call back into the query to make it run fast once you get up to a large number of rows in your tables.
A better way to deal with common calculations is to insert them into a new column with a trigger, or in your insert/update queries. That way you can index the calculated value and use it directly instead of figuring it out each time you need it.

Sql doesn't give you a lot of flexibility when it comes to code reuse. I usually create functions when it comes to calculations or other tasks that don't involve modifying tables. But all tasks that involve writing to tables and that sort of things I usually use a stored procedure to get a better control of the transactions.

You can break code into separate stored procedures to help break down complex stored procs into more manageable chunks. You can also do the same to break out common logic that won't work in a function. Think of it similar to an Extract Method refactoring.

Another way to look at it from the applications side is to use binding to reuse your SQL queries. But that's probably not what your looking for

To follow up on this, I did run into some performance problems, and it seems that the optimizer is not able to pick the correct index for code inside the functions.
So I had to specify the correct index using index hints (with keyword), to solve the performance issue.

Related

Does Table-Valued Function (SQL) create table on each call? [performance]

Okay this might sound a noob question, but SQL isn't my really strength, so I am requesting some help here.
I am trying to implement something, but I am concerned about performance issues.
The problem I am trying to fix is something like this:
I have a column with a lot of data separated by commas ","
Something like this: data1,data2,data3,data57
What I need is looping through each piece of data separated by commas for all the records, and then do something with that single piece data, do you get it?
I found a solution that can actually help me, but I am worried about system performance, because I might need to make multiple calls to this function using different parameters!
Does a table is created on each call I made to the Table-Valued Function (UDF) or does the sql server saves it as cache? [maybe I would rather need a temporary table?]
Thank you for your help in advance!
Note: The data is not mine, and I should use it as is, so suggesting to change the database is out of question (however I know that would be the best scenario).
a
Note2: The purpose of this question/problem is to import initial data to the database, performance may not be a serious problem since it won't run many times, but still I wanna regard that issue, and do it the best way I can!
User defined, table-valued functions that are composed of multiple statements, as the one you found is, will create an object in the tempdb system database, populate it and then dispose of it when the object goes out of scope.
If you want to run this multiple times over the same parameters, you might consider creating a table variable and caching the result in that yourself. If you're going to be calling it on different lists on comma-separated values though, there's not a great way of avoiding the overhead. SQL Server isn't really built for lots of string manipulation.
Generally, for one-off jobs, the performance implications of this tempdb usage is not going to be a major concern for you. It's more concerning when it's a common pattern in the day-to-day of the database life.
I'd suggest trying, if you can, on a suitably sized subset of the data to gauge the performance of your solution.
Since you say you're on SQL Server 2016, you can make use of the new STRING_SPLIT function, something like
SELECT t.Column1, t.Column2, s.value
FROM table t
CROSS APPLY STRING_SPLIT(t.CsvColumn, ',') s
May get you close to where you want, without the need to define a new function. Note, your database needs to be running under the 2016 compatibility level (130) for this to be available, simply running on SQL 2016 isn't enough (they often do this with new features to avoid the risk of backwards-compatibility-breaking changes).

Is this a valid benefit of using embedded SQL over stored procedures?

Here's an argument for SPs that I haven't heard. Flamers, be gentle with the down tick,
Since there is overhead associated with each trip to the database server, I would suggest that a POSSIBLE reason for placing your SQL in SPs over embedded code is that you are more insulated to change without taking a performance hit.
For example. Let's say you need to perform Query A that returns a scalar integer.
Then, later, the requirements change and you decide that it the results of the scalar is > x that then, and only then, you need to perform another query. If you performed the first query in a SP, you could easily check the result of the first query and conditionally execute the 2nd SQL in the same SP.
How would you do this efficiently in embedded SQL w/o perform a separate query or an unnecessary query?
Here's an example:
--This SP may return 1 or two queries.
SELECT #CustCount = COUNT(*) FROM CUSTOMER
IF #CustCount > 10
SELECT * FROM PRODUCT
Can this/what is the best way to do this in embedded SQL?
A very persuasive article
SQL and stored procedures will be there for the duration of your data.
Client languages come and go, and you'll have to re-implement your embedded SQL every time.
In the example you provide, the time saved is sending a single scalar value and a single follow-up query over the wire. This is insignificant in any reasonable scenario. That's not to say there might not be other valid performance reasons to use SPs; just that this isn't such a reason.
I would generally never put business logic in SP's, I like them to be in my native language of choice outside the database. The only time I agree SPs are better is when there is a lot of data movement that don't need to come out of the db.
So to aswer your question, I'd rather have two queries in my code than embed that in a SP, in my view I am trading a small performance hit for something a lot more clear.
How would you do this efficiently in
embedded SQL w/o perform a separate
query or an unnecessary query?
Depends on the database you are using. In SQL Server, this is a simple CASE statement.
Perhaps include the WHERE clause in that sproc:
WHERE (all your regular conditions)
AND myScalar > myThreshold
Lately I prefer to not use SPs (Except when uber complexity arises where a proc would just be better...or CLR would be better). I have been using the Repository pattern with LINQ to SQL where my query is written in my data layer in a strongly typed LINQ expression. The key here is that the query is strongly typed which means when I refactor I am refactoring properties of a class that is directly generated from the database table (which makes changes from the DB carried all the way forward super easy and accurate). While my SQL is generated for me and sent to the server I still have the option of sticking to DRY principles as the repository pattern allows me to break things down into their smallest component. I do have the issue that I might make a trip to the server and based on the results of query I may find that I need to make another trip to the server. I don't worry about this up front. If I find later that it becomes an issue then I may refactor that code into something more performant. The over all key here is that there is no one magic bullet. I tend to work on greenfield applications which allows this method of development to be most efficient for me.
Benefits of SPs:
Performance (are precompiled)
Easy to change (without compiling the application)
SQL set based features make very easy doing really difficult data tasks
Drawbacks:
Depend heavily on the database engine used
Makes deployment of upgrades a little harder (you have to deploy the App + the scripts)
My 2 cents...
About your example, it can be done like this:
select * from products where (select count(*) from customers>10)

Stored procedures or inline queries?

First of all there is a partial question regarding this, but it is not exactly what I'm asking, so, bear with me and go for it.
My question is, after looking at what SubSonic does and the excellent videos from Rob Connery I need to ask: Shall we use a tool like this and do Inline queries or shall we do the queries using a call to the stored procedure?
I don't want to minimize any work from Rob (which I think it's amazing) but I just want your opinion on this cause I need to start a new project and I'm in the middle of the line; shall I use SubSonic (or other like tool, like NHibernate) or I just continue my method that is always call a stored procedure even if it's a simple as
Select this, that from myTable where myStuff = StackOverflow;
It doesn't need to be one or the other. If it's a simple query, use the SubSonic query tool. If it's more complex, use a stored procedure and load up a collection or create a dataset from the results.
See here: What are the pros and cons to keeping SQL in Stored Procs versus Code and here SubSonic and Stored Procedures
See answers here and here. I use sprocs whenever I can, except when red tape means it takes a week to make it into the database.
Stored procedures are gold when you have several applications that depend on the same database. It let's you define and maintain query logic once, rather than several places.
On the other hand, it's pretty easy for stored procedures themselves to become a big jumbled mess in the database, since most systems don't have a good method for organizing them logically. And they can be more difficult to version and track changes.
I wouldn't personally follow rigid rules. Certainly during the development stages, you want to be able to quickly change your queries so I would inline them.
Later on, I would move to stored procedures because they offer the following two advantages. I'm sure there are more but these two win me over.
1/ Stored procedures group the data and the code for manipulating/extracting that data at one point. This makes the life of your DBA a lot easier (assuming your app is sizable enough to warrant a DBA) since they can optimize based on known factors.
One of the big bugbears of a DBA is ad-hoc queries (especially by clowns who don't know what a full table scan is). DBAs prefer to have nice consistent queries that they can tune the database to.
2/ Stored procedures can contain logic which is best left in the database. I've seen stored procs in DB2/z with many dozens of lines but all the client has to code is a single line like "give me that list".
Because the logic for "that list" is stored in the database, the DBAs can modify how it's stored and extracted at will without compromising or changing the client code. This is similar to encapsulation that made object-orientd languages 'cleaner' than what came before.
I've done a mix of inline queries and stored procedures. I prefer more of the stored procedure/view approach as it gains a nice spot for you to make a change if needed. When you have inline queries you always have to go and change the code to change an inline query and then re-roll the application. You also might have the inline query in multiple places so you would have to change a lot more code than with one stored procedure.
Then again if you have to add a parameter to a stored procedure, your still changing a lot of code anyways.
Another note is how often the data changes behind the stored procedure, where I work we have third party tables that may break up into normalized tables, or a table becomes obsolete. In that case a stored procedure/view may minimize the exposure you have to that change.
I've also written a entire application without stored procedures. It had three classes and 10 pages, was not worth it at all. I think there comes a point when its overkill, or can be justified, but it also comes down to your personal opinion and preference.
Are you going to only ever access your database from that one application?
If not, then you are probably better off using stored procedures so that you can have a consistent interface to your database.
Is there any significant cost to distributing your application if you need to make a change?
If so, then you are probably better off using stored procedures which can be changed at the server and those changes won't need to be distributed.
Are you at all concerned about the security of your database?
If so, then you probably want to use stored procedures so that you don't have to grant direct access to tables to a user.
If you're writing a small application, without a wide audience, for a system that won't be used or accessed outside of your application, then inline SQL might be ok.
With Subsonic you will use inline, views and stored procedures. Subsonic makes data access easier, but you can't do everthing in a subsonic query. Though the latest version, 2.1 is getting better.
For basic CRUD operations, inline SQL will be straight forward. For more complex data needs, a view will need to be made and then you will do a Subsonic query on the view.
Stored procs are good for harder data computations and data retrieval. Set based retrieval is usually always faster then procedural processing.
Current Subsonic application uses all three options with great results.
I prefer inline sql unless the stored procedure has actual logic (variables, cursors, etc) involved. I have been using LINQ to SQL lately, and taking the generated classes and adding partial classes that have some predefined, common linq queries. I feel this makes for faster development.
Edit: I know I'm going to get downmodded for this. If you ever talk down on foreign keys or stored procedures, you will get downmodded. DBAs need job security I guess...
The advantages of stored procedure (to my mind)
The SQL is in one place
You are able to get query plans.
You can modify the database structure if necessary to improve performance
They are compiled and thus those query plans do not have to get constructed on the fly
If you use permissions - you can be sure of the queries that the application will make.
Stored procedures group the data and the code for manipulating/extracting that data at one point. This makes the life of your DBA a lot easier (assuming your app is sizable enough to warrant a DBA) since they can optimize based on known factors.
Stored procedures can contain logic which is best left in the database. I've seen stored procs in DB2/z with many dozens of lines but all the client has to code is a single line like "give me that list".
the best advantage of using stored procs i found is that when we want to change in the logic, in case of inline query we need to go to everyplace and change it and re- roll the every application but in the case of stored proc change is required only at one place.
So use inline queries when you have clear logic; otherwise prefer stored procs.

Is it better to write a more targeted stored procedure with fewer parameters?

Say I have a stored procedure that returns data from a SELECT query. I would like to get a slightly different cut on those results depending on what parameters I pass through. I'm wondering whether it is better design to have multiple stored procedures that take one or no parameters to do this (for example, GetXByDate or GetXByUser), or one stored procedure with multiple parameters that does the lot (for example, GetX)?
The advantage of the first option is that it's simpler and maybe faster, but disadvantage is that the essence of the query is duplicated across the stored procedures and needs to be maintained in several places.
The advantage of the second option is that the query is only present once, but the disadvantage is that the query is more complex and harder to troubleshoot.
What do you use in your solutions and why?
The more complex stored procedures are more complex for the SQL server to compile
correctly and execute quickly and efficiently.
Even in the big stored procedure you have to either have to have several copies of the query or add lots of CASEs and IFs in it which reduce performance. So you don't really gain much from lumping everything together.
From my personal experience I also consider large SQL sp code with lots of branches more difficult to maintain that several smaller and straightforward sprocs.
You could consider using views and UDFs to reduce copy-pasting of the query code.
Saying that if you don't care about performance (intranet app, the queries are not that heavy, don't run that often) you might find having a universal sproc quite handy.
I would treat stored procedures much in the same way as I would a method on a class. It ought to do one thing and do it simply. Consider applying the same sorts of refactoring/code smell rules to your stored procedures that you would to your application code.
I prefer GetXByDate, GetXByUser, ... for simple stored procedures, on the basis they will require little maintenance anyway, and in this situation I think it is easier to maintain duplicate code than complicated code.
Of course, if you have more complicated stored procedures, this may not be true. GetAndProcessXByDate may be better reduced to GetXByDate, GetXByUser, ... which call another stored proc ProcessX.
So I guess the definitive answer is: it depends... :)
I second #tvanfosson.
However, I would add that you can do both: have a multi-use sproc (e.g. GetX) which contains the essential logic for a whole class of queries, and wrap it up in a series of smaller sprocs (GetXY, GetXZ) which execute the big one, passing in the appropriate parameters.
This means that you Don't Repeat Yourself, but you can also provide a simple interface to the client apps: an app which only ever calls GetXY doesn't have to know about GetXZ.
We use this approach sometimes.
One advantage of the single stored proc if you're using a generated C# data access layer like LinqToSQL a single class is generated to represent your resultset.
AJs approach gives you the best of both worlds. The pain of having to maintain repeated code across several sprocs cannot be overstated.
Build Sproc and UDF modules for common use, and call them from task-specific sprocs.
Who/what will be calling these stored procedures? I wouldn't write stored procedures for SELECT statements normally, precisely because there are lots of different SELECT statements you might want, including joins to other tables etc.

Why is parameterized SQL generated by NHibernate just as fast as a stored procedure?

One of my co-workers claims that even though the execution path is cached, there is no way parameterized SQL generated from an ORM is as quick as a stored procedure. Any help with this stubborn developer?
I would start by reading this article:
http://decipherinfosys.wordpress.com/2007/03/27/using-stored-procedures-vs-dynamic-sql-generated-by-orm/
Here is a speed test between the two:
http://www.blackwasp.co.uk/SpeedTestSqlSproc.aspx
Round 1 - You can start a profiler trace and compare the execution times.
For most people, the best way to convince them is to "show them the proof." In this case, I would create a couple basic test cases to retrieve the same set of data, and then time how long it takes using stored procedures versus NHibernate. Once you have the results, hand it over to them and most skeptical people should yield to the evidence.
I would only add a couple things to Rob's answer:
First, Make sure the amount of data involved in the test cases is similiar to production values. In other words if your queries are normally against tables with hundreds of thousands or rows, then create such a test environment.
Second, make everything else equal except for the use of an nHibernate generated query and a s'proc call. Hopefully you can execute the test by simply swapping out a provider.
Finally, realize that there is usually a lot more at stake than just stored procedures vs. ORM. With that in mind the test should look at all of the factors: execution time, memory consumption, scalability, debugging ability, etc.
The problem here is that you've accepted the burden of proof. You're unlikely to change someone's mind like that. Like it or not, people--even programmers-- are just too emotional to be easily swayed by logic. You need to put the burden of proof back on him- get him to convince you otherwise- and that will force him to do the research and discover the answer for himself.
A better argument to use stored procedures is security. If you use only stored procedures, with no dynamic sql, you can disable SELECT, INSERT, UPDATE, DELETE, ALTER, and CREATE permissions for the application database user. This will protect you against most 2nd order SQL Injection, whereas parameterized queries are only effective against first order injection.
Measure it, but in a non-micro-benchmark, i.e. something that represents real operations in your system. Even if there would be a tiny performance benefit for a stored procedure it will be insignificant against the other costs your code is incurring: actually retrieving data, converting it, displaying it, etc. Not to mention that using stored procedures amounts to spreading your logic out over your app and your database with no significant version control, unit tests or refactoring support in the latter.
Benchmark it yourself. Write a testbed class that executes a sampled stored procedure a few hundred times, and run the NHibernate code the same amount of times. Compare the average and median execution time of each method.
It is just as fast if the query is the same each time. Sql Server 2005 caches query plans at the level of each statement in a batch, regardless of where the SQL comes from.
The long-term difference might be that stored procedures are many, many times easier for a DBA to manage and tune, whereas hundreds of different queries that have to be gleaned from profiler traces are a nightmare.
I've had this argument many times over.
Almost always I end up grabbing a really good dba, and running a proc and a piece of code with the profiler running, and get the dba to show that the results are so close its negligible.
Measure it.
Really, any discussion on this topic is probably futile until you've measured it.
He may be correct for the specific use case he is thinking of. A stored procedure will probably execute faster for some complex set of SQL, that can be arbitrarily tuned. However, something you get from things like hibernate is caching. This may prove much faster for the lifetime of your actual application.
The additional layer of abstraction will cause it to be slower than a pure call to a sproc. Just by the fact that you have additional allocations on the managed heap, and additional pushes and pops off the callstack, the truth of the matter is that it is more efficient to call a sproc over having an ORM build the query, regardless how good the ORM is.
How slow, if its even measurable, is debatable. This is also helped by the fact that most ORM's have a caching mechanism to avoid doing the query at all.
Even if the stored procedure is 10% faster (it probably isn't), you may want to ask yourself how much it really matters. What really matters in the end, is how easy it is to write and maintain code for your system. If you are coding a web app, and your pages all return in 0.25 seconds, then the extra time saved by using stored procedures is negligible. However, there can be many added advantages of using an ORM like NHibernate, which would be extremely hard to duplicate using only stored procedures.