metaprogramming with stored procedures? - sql

This is going to be both a direct question and a discussion point. I'll ask the direct question first:
Can a stored procedure create another stored procedure dynamically? (Personally I'm interested in SQL Server 2008, but in the interests of wider discussion will leave it open)
Now to the reason I'm asking. Briefly (you can read more elsewhere), User Defined Scalar Functions in SQL Server are performance bottlenecks, at best. I've seen uses in our code base that slow the total query down by 3-4x, but from what I've read the local impact of the S-UDF can be 10x+
However, UDFs are, potentially, great for raising abstraction levels, reducing lots of tedious boilerplate, centralising logic rules etc. In most cases they boil down to simple expressions that could easily be expanded inline - but they're not (I'm really only thinking of non-querying functions - e.g. string manipluations). I've seen a bug report for this to be addressed in a future release - with some buy-in from MS. But for now we have to live with the (IMHO) broken implementation.
One workaround is to use a table value UDF instead - however these complicate the client code in ways you don't always want to deal with (esp. when the UDF just computes the result of an expression).
So my crazy idea, at first, was to write the procs with C Preprocessor directives, then pass it through a preprocessor before submitting to the RDBMS. This could work, but has its own problems.
That led me to my next crazy idea, which was to define the "macros" in the DB itself, and have a master proc that accepts a string containing an unprocessed SP with macros, expands the macros inline, then submits it on to the RDMS. This is not what SPs are good at, but I think it could work - assuming you can do this in the first place - hence my original question.
However, now I have explained my path to the question, I'd also like to leave it open for other ideas. I'm sure I'm not the only one who has been thinking along these lines. Perhaps there are third-party solutions already out there? My googling has not turned up much yet.
Also I thought it would be a fun discussion topic.
[edit]
This blog post I found in my research describes the same issue I'm seeing. I'd be happy if anyone could point out something that I, or the blog poster, might be doing wrong that leads to the overhead.
I should also add that I am using WITH SCHEMABINDING on my S-UDF, although it doesn't seem to be giving me any advantage

your string processing UDF won't be a perf problem. Scalar UDF's are a problem only when they perform selects and those selects are done for every row. this in turn spikes the IO.
string manipulaation on the other hand is done in memory and is fast.
as for your idea i can't really see any benefit of it. creating and dropping objects like that can be an expensive operation and may lead to schema locking.

Related

Hard Coding or Magic Literals improve performance in SQL Server Stored Procedures?

From day 1 of my programming career I have always been told hard-coding/magic numbers or literals are bad. One should avoid it. I followed this principle even in Stored Procedures in SQL Server.
Recently I came across this article by Ian Jose and it came as a rude shock to me. He has nicely explained the problem with this practice in SQL Server Stored Procedures with evidence. Ian has demonstrated that the performance takes a big hit with this approach.
See this article by Chris Hedgate to know what my reaction was when I read Ian's blog.
I understand if it is a smaller code snippet and the hard coded value is not used more than once, then some people take the liberty keeping them instead of declaring them as variables or constants and using the variables or constants instead. But it seems hard coding should be encouraged in Stored Procedures?
I did find a similar discussion on SO but that thread isn't evaluating the performance hit, rather it is talking about how to avoid hard-coding, something that even I used to consider myself very good at. But now after reading Ian's blog, I feel I am adding more problem there than solving it.
I would like to request the other experts to share their experience on this topic as this seems to be a violation of fundamental programming guidelines. Could this be because of the way SQL Server is designed? Is this not an issue in Oracle or MySQL? Oracle in fact suggest these 3 nice approaches to avoid hard coding.
Edit: Thank you all for your valuable comments. I would like to request for some more statistics, to constructively challenge Ian's numbers. Just to conclude that avoiding hard coding does not have that bad an impact on performance, which is what Ian has demonstrated. I am sorry if I am asking for too much and eagerly look forward to your responses.
I had also opened a thread on LinkedIn as I didn't get many responses here at first. For people who have the same confusion as me can refer to that for some more info.
Here's how I understand the problem:
Hard coded literals in a DBMS (SQL / Oracle) do improve performance.
However, the second time you execute the query, where the hard coded literal changes, it must be recompiled, which fills up the cache.
This means that the cache can't be used efficiently for the rest of the DB.
So, whilst:
SELECT * FROM X WHERE Y = '123'
Might be slightly faster than:
SELECT * FROM X WHERE Y = #myVar
If you then want to do:
SELECT * FROM X WHERE Y = '234'
The DBMS sees that as a different query. This then fills up the cache, meaning that the next time you issue a query that should be cached, it will be slower.
Parameters are better, since the same plan can be used for multiple parameter values. An SP in SQL Server is a contract with the app writer that one plan can be used for multiple invocations. It has been eight years since I worked in SQL Server, and much may have changed. However, the optimizer will "sniff" the parm value for optimization purposes. It is nice if that value is "typical". Unfortunately, recompile can happen at any time, and hence with any parameter value. Plan guides are one technique to make an application immune to this danger. There are likely other techniques as well.

Prevent use of pre ANSI-92 old syntax

I wonder if there's a way to prevent the creation of objects that contain old ansi sintax of join, maybe server triggers, can anyone help me?
You can create a DDL trigger and mine the eventdata() XML for the content of the proc. If you can detect the old syntax using some fancy string-parsing functions (maybe looking for commas between known table names or looking for *= or =*), then you can roll back the creation of the proc or function.
First reaction - code reviews and a decent QA process!
I've had some success looking at sys.syscomments.text. A simple where text like '%*=%' should do. Be aware that long SQL strings may be split across multiple rows. I realise this won't prevent objects getting in there in the first place. But then DDL triggers won't tell you how big your current problem is.
Although I fully understand your effort, I believe that this type of actions is the wrong way of getting where you want. First of all, you might get into serious trouble with your boss and, depending of where you work, get fired.
Second, as stated before, doing code reviews, explaining why the old syntax sucks. You have to have a decent reason why one should avoid the *= stuff. 'Because you don't like it' is not a feasible argument. In fact, there are quite some articles around showing that certain problems are just not solvable using this type of syntax.
Third, you might want to point out that separating conditions into grouping (JOIN ... ON...) and filtering conditions (WHERE...) increases the readability and might therefore be an options.
Collect your arguments and convince your colleagues rather than punishing them in quite an arrogant way.

How do you think while formulating Sql Queries. Is it an experience or a concept?

I have been working on sql server and front end coding and have usually faced problem formulating queries.
I do understand most of the concepts of sql that are needed in formulating queries but whenever some new functionality comes into the picture that can be dont using sql query, i do usually fails resolving them.
I am very comfortable with select queries using joins and all such things but when it comes to DML operation i usually fails
For every query that i never done before I usually finds uncomfortable with that while creating them. Whenever I goes for an interview I usually faces this problem.
Is it their some concept behind approaching on formulating sql queries.
Eg.
I need to create an sql query such that
A table contain single column having duplicate record. I need to remove duplicate records.
I know i can find the solution to this query very easily on Googling, but I want to know how everyone comes to the desired result.
Is it something like Practice Makes Man Perfect i.e. once you did it, next time you will be able to formulate or their is some logic or concept behind.
I could have get my answer of solving above problem simply by posting it on stackoverflow and i would have been with an answer within 5 to 10 minutes but I want to know the reason. How do you work on any new kind of query. Is it a major contribution of experience or some an implementation of concepts.
Whenever I learns some new thing in coding section I tries to utilize it wherever I can use it. But here scenario seems to be changed because might be i am lagging in some concepts.
EDIT
How could I test my knowledge and
concepts in Sql and related sql
queries ?
Typically, the first time you need to open a child proof bottle of pills, you have a hard time, but after that you are prepared for what it might/will entail.
So it is with programming (me thinks).
You find problems, research best practices, and beat your head against a couple of rocks, but in the process you will come to have a handy set of tools.
Also, reading what others tried/did, is a good way to avoid major obsticles.
All in all, with a lot of practice/coding, you will see patterns quicker, and learn to notice where to make use of what tool.
I have a somewhat methodical method of constructing queries in general, and it is something I use elsewhere with any problem solving I need to do.
The first step is ALWAYS listing out any bits of information I have in a request. Information is essentially anything that tells me something about something.
A table contain single column having
duplicate record. I need to remove
duplicate
I have a table (I'll call it table1)
I have a
column on table table1 (I'll call it col1)
I have
duplicates in col1 on table table1
I need to remove
duplicates.
The next step of my query construction is identifying the action I'll take from the information I have.
I'll look for certain keywords (e.g. remove, create, edit, show, etc...) along with the standard insert, update, delete to determine the action.
In the example this would be DELETE because of remove.
The next step is isolation.
Asnwer the question "the action determined above should only be valid for ______..?" This part is almost always the most difficult part of constructing any query because it's usually abstract.
In the above example you're listing "duplicate records" as a piece of information, but that's really an abstract concept of something (anything where a specific value is not unique in usage).
Isolation is also where I test my action using a SELECT statement.
Every new query I run gets thrown through a select first!
The next step is execution, or essentially the "how do I get this done" part of a request.
A lot of times you'll figure the how out during the isolation step, but in some instances (yours included) how you isolate something, and how you fix it is not the same thing.
Showing duplicated values is different than removing a specific duplicate.
The last step is implementation. This is just where I take everything and make the query...
Summing it all up... for me to construct a query I'll pick out all information that I have in the request. Using the information I'll figure out what I need to do (the action), and what I need to do it on (isolation). Once I know what I need to do with what I figure out the execution.
Every single time I'm starting a new "query" I'll run it through these general steps to get an idea for what I'm going to do at an abstract level.
For specific implementations of an actual request you'll have to have some knowledge (or access to google) to go further than this.
Kris
I think in the same way I cook dinner. I have some ingredients (tables, columns etc.), some cooking methods (SELECT, UPDATE, INSERT, GROUP BY etc.) then I put them together in the way I know how.
Sometimes I will do something weird and find it tastes horrible, or that it is amazing.
Occasionally I will pick up new recipes from the internet or friends, then use parts of these in my own.
I also save my recipes in handy repositories, broken down into reusable chunks.
On the "Delete a duplicate" example, I'd come to the result by googling it. This scenario is so rare if the DB is designed properly that I wouldn't bother keeping this information in my head. Why bother, when there is a good resource is available for me to look it up when I need it?
For other queries, it really is practice makes perfect.
Over time, you get to remember frequently used patterns just because they ARE frequently used. Rare cases should be kept in a reference material. I've simply got too much other stuff to remember.
Find a good documentation to your software. I am using Mysql a lot and Mysql has excellent documentation site with decent search function so you get many answers just by reading docs. If you do NOT get your answer at least you are learning something.
Than I set up an example database (or use the one I am working on) and gradually build my SQL. I tend to separate the problem into small pieces and solve it step by step - this is very successful if you are building queries including many JOINS - it is best to start with some particular case and "polute" your SQL with many conditions like WHEN id = "123" which you are taking out as you are working towards your solution.
The best and fastest way to learn good SQL is to work with someone else, preferably someone who knows more than you, but it is not necessarry condition. It can be replaced by studying mature code written by others.
Your example is a test of how well you understand the DISTINCT keyword and the GROUP BY clause, which are SQL's ways of dealing with duplicate data.
Examples and experience. You look at other peoples examples and you create your own code and once it groks, you don't need to think about it again.
I would have a look at the Mere Mortals book - I think it's the one by Hernandez. I remember that when I first started seriously with SQL Server 6.5, moving from manual ISAM databases and Access database systems using VB4, that it was difficult to understand the syntax, the joins and the declarative style. And the SQL queries, while powerful, were very intimidating to understand - because typically, I was looking at generated code in Microsoft Access.
However, once I had developed a relatively systematic approach to building queries in a consistent and straightforward fashion, my skills and confidence quickly moved forward.
From seeing your responses you have two options.
Have a copy of the specification for whatever your working on (SQL spec and the documentation for the SQL implementation (SQLite, SQL Server etc..)
Use Google, SO, Books, etc.. as a resource to find answers.
You can't formulate an answer to a problem without doing one of the above. The first option is to become well versed into the capabilities of whatever you are working on.
The second option allows you to find answers that you may not even fully know how to ask. You example is fairly simplistic, so if you read the spec/implementation documentaion you would know the answer right away. But there are times, where even if you read the spec/documentation you don't know the answer. You only know that it IS possible, just not how to do it.
Remember that as far as jobs and supervisors go, being able to resolve a problem is important, but the faster you can do it the better which can often be done with option 2.

How do I 'refactor' SQL Queries?

I have several MS Access queries (in views and stored procedures) that I am converting to SQL Server 2000 (T-SQL). Due to Access's limitations regarding sub-queries, and or the limitations of the original developer, many views have been created that function only as sub-queries for other views.
I don't have a clear business requirements spec, except to 'do what the Access application does', and half a page of notes on reports/CSV extracts, but the Access application doesn't even do what I suspect is required properly.
I, therefore, have to take a bottom up approach, and 'copy' the Access DB to T-SQL, where I would normally have a better understanding of requirements and take a top down approach, creating new queries to satisfy well defined requirements.
Is there a method I can follow in doing this? Do I spread it all out and spend a few days 'grokking' it, or do I continue just copying the Access views and adopt an evolutionary approach to optimising the querying?
Work out what access does with the queries, and then use this knowledge to check that you've transferred it properly. Only once you've done this can you think about refactoring. I'd start with slow queries and then go from there: work out what indexes you need and then progressively rewrite. This way you can deliver as soon as you've proved that you moved everything successfully (even if it is potentially a bit slower). That's much better than not being able to deliver at all because problem X came along.
I'd probably start with the Access database, exercise the queries in situ and see what the resultset is. Often you can understand what the query accomplishes and then work back to your own design to accomplish it. (To be thorough, you'll need to understand the intent pretty completely anyway.) And that sounds like the best statement of requirements you're going to get - "Just like it's implemented now."
Other than that, You're approach is the best I can think of. Once they are in SQL Server, just start testing and grokking.
When you are dealing with a problem like this it's often helpful to keep things working as they are while you make incremental changes. This is better from a risk management perspective.
I'd concentrate on getting it working, then checking the database performance and optimizing performance problems. Then, as you add features and fix bugs, clean up the code that's hard to maintain. As you said, a sub-query is really very similar to a view. So if it's not broken you may not need to change it.
This depends on your timeline. If you have to get the project running absolutely as soon as possible (I know this is true for EVERY project, but if it's REALLY true for you), then yes, duplicate the functionality and infrastructure from Access then do your refactoring either later or as you go.
If you have SOME time you can dedicate to it, then refactoring it now will give you two things:
You'll be happier with the code, and it will (likely) perform better, since actual analysis was done rather than the transcoding equivalent of a copy-paste
You'll likely gain a greater understanding of what the true business rules are, since you'll almost certainly come across things that aren't in the spec (especially considering how you describe them)
I would recommend copying the views to SQL Server immediately, and then use its sophisticated tools to help you grok them.
For example, SQL Server can tell you what views, stored procedures, etc, rely on a particular view, so you can see from there whether the view is a one-of or if it's actually used in more than one place. It will help you determine which views are more important than which.

Languages other than SQL in postgres

I've been using PostgreSQL a little bit lately, and one of the things that I think is cool is that you can use languages other than SQL for scripting functions and whatnot. But when is this actually useful?
For example, the documentation says that the main use for PL/Perl is that it's pretty good at text manipulation. But isn't that more of something that should be programmed into the application?
Secondly, is there any valid reason to use an untrusted language? It seems like making it so that any user can execute any operation would be a bad idea on a production system.
PS. Bonus points if someone can make PL/LOLCODE seem useful.
#Mike: this kind of thinking makes me nervous. I've heard to many times "this should be infinitely portable", but when the question is asked: do you actually foresee that there will be any porting? the answer is: no.
Sticking to the lowest common denominator can really hurt performance, as can the introduction of abstraction layers (ORM's, PHP PDO, etc). My opinion is:
Evaluate realistically if there is a need to support multiple RDBMS's. For example if you are writing an open source web application, chances are that you need to support MySQL and PostgreSQL at least (if not MSSQL and Oracle)
After the evaluation, make the most of the platform you decided upon
And BTW: you are mixing relational with non-relation databases (CouchDB is not a RDBMS comparable with Oracle for example), further exemplifying the point that the perceived need for portability is many times greatly overestimated.
"isn't that [text manipulation] more of something that should be programmed into the application?"
Usually, yes. The generally accepted "three-tier" application design for databases says that your logic should be in the middle tier, between the client and the database. However, sometimes you need some logic in a trigger or need to index on a function, requiring that some code be placed into the database. In that case all the usual "which language should I use?" questions come up.
If you only need a little logic, the most-portable language should probably be used (pl/pgSQL). If you need to do some serious programming though, you might be better off using a more expressive language (maybe pl/ruby). This will always be a judgment call.
"is there any valid reason to use an untrusted language?"
As above, yes. Again, putting direct file access (for example) into your middle tier is best when possible, but if you need to fire things off based on triggers (that might need access to data not available directly to your middle tier), then you need untrusted languages. It's not ideal, and should generally be avoided. And you definitely need to guard access to it.
These days, any "unique" or "cool" feature in a DBMS makes me incredibly nervous. I break out in a rash and have to stop work until the itching goes away.
I just hate to be locked in to a platform unnecessarily. Suppose you build a big chunk of your system in PL/Perl inside the database. Or in C# within SQL Server, or PL/SQL within Oracle, there are plenty of examples*.
Now you suddenly discover that your chosen platform doesn't scale. Or isn't fast enough. Or something. Worse, there's a new kid on the database block (something like MonetDB, CouchDB, Cache, say but much cooler) that would solve all your problems (even if your only problem, like mine, is having an uncool databse platform). And you can't switch to it without recoding half your application.
(*Admittedly, the paid-for products are to some extent seeking to lock you in by persuading you to use their unique features, which is not an accusation that can directly be levelled at the free providers, but the effect is the same).
So that's a rant on the first part of the question. Heart-felt, though.
is there any valid reason to use an
untrusted language? It seems like
making it so that any user can execute
any operation would be a bad idea
My goodness, yes it does! A sort of "Perl injection attack"? Almost worth doing it just to see what happens, I'd have thought.
For philosophical reasons outlined above I think I'll pass on the PL/LOLCODE challenge. Although I was somewhat amazed to discover it was a link to something extant.
From my perspective, I guess the answer is 'it depends'.
There is an argument that manipulation of the data belongs in the database layer, so that the business logic does not need to be overly concerned about how the manipulation happens, it just knows that it has.
Another very good reason to process data on the db layer is if the volume of data being crunched means that network bandwidth will become an issue. I once had to categorise very large amounts of data. Processing this in the application layer was severly restricted by the time required to transfer all the data across the network for processing.
I then wrote a binning algorithm in PL/pgSQL and it worked much faster.
Regarding untrusted languages, I heard a podcast from Josh Berkus (a postgres advocate) who discussed an application of postgresql that brought in data from MySQL as part of its processing, so that the communication itself was handled by the postgres server. I don't remember the full details, I think it was on the FLOSS Weekly podcast which is quite an interesting discussion of the history of PostGRESQL and some of the issues it is put to.
The untrusted versions of the procedural languages allow you to access I/O on the system.
This can come in handy if you need a trigger or something send a email or connect to a socket server to send a popup notification. There are tons of uses for this type of thing, and because of postgresql isolation levels you cans safely do things like this.
You can put checkpoints in the function so if the transaction fails the email or whatever won't go out. The nice thing about doing this is it removes the logic from the client and puts it on the server.
I think most additional languages are offered so that if you develop in that language on a regular basis, you can feel comfortable writing db functions, triggers, etc. The usefulness of these features is to provide a control over data as close to the data as possible.
An example of a useful stored procedure I recently wrote in an external language that would not have been possible in pl/sql is a version of 'df' which allowed SQL table generators to pick a tablespace with the most free space available at runtime.
I used plperlu, and it was relatively simple, although I had to be careful with data typing.