Is there a better way to debug SQL? - sql

I have worked with SQL for several years now, primarily MySQL/PhpMyAdmin, but also Oracle/iSqlPlus and PL/SQL lately. I have programmed in PHP, Java, ActionScript and more. I realise SQL isn't an imperative programming language like the others - but why do the error messages seem so much less specific in SQL? In other environments I'm pointed straight to the root of the problem. More often that not, MySQL gives me errors like "error AROUND where u.id = ..." and prints the whole query. This is even more difficult with stored procedures, where debugging can be a complete nightmare.
Am I missing a magic tool/language/plugin/setting that gives better error reporting or are we stuck with this? I want a debugger or language which gives me the same amount of control that Eclipse gives me when setting breakpoints and stepping trough the code. Is this possible?

I think the answer lies in the fact that SQL is a set-based language with a few procedural things attached. Since the designers were thinking in set-based terms, they didn't think that the ordinary type of debugging that other languages have is important. However, I think some of this is changing. You can set breakpoints in SQL Server 2008. I haven't used it really as you must have SQL Server 2008 databases before it will work and most of ours are still SQL Server 2000. But it is available and it does allow you to step through things. You still are going to have problems when your select statement is 150 lines long and it knows that the syntax isn't right but it can't point out exactly where as it is all one command.
Personally when I am writing a long procedural SP, I build in a test mode that includes showing me the results of things I do, the values of key variables at specific points I'm interested in, and print staments that let me know what steps have been completed and then rolling the whole thing back when done. That way I can see what would have happened if it had run for real, but not have hurt any of the data in the database if I got it wrong. I find this very useful. It can vastly increase the size of your proc though. I have a template I use that has most of the structure I need set up in it, so it doesn't really take me too long to do. Especially since I never add an insert. update or delete to a proc without first testing the associated select to ensure I have the records I want.

I think the explanation is that "regular" languages have much smaller individual statements than SQL, so that single-statement granularity points to a much smaller part of the code in them than in SQL. A single SQL statement can be a page or more in length; in other languages it's usually a single line.
I don't think that makes it impossible for debuggers / IDEs to more precisely identify errors, but I suspect it makes it harder.

I agree with your complaint.
Building a good logging framework and overusing it in your sprocs is what works best for me.
Before and after every transaction or important piece of logic, I write out the sproc name, step timestamp and a rowcount (if relevant) to my log table. I find that when I have done this, I can usually narrow down the problem spot within a few minutes.
Add a debug parameter to the sproc (default to "N") and pass it through to any other sprocs that it calls so that you can easily turn logging on or off.

As for breakpoints and stepping through code, you can do this with MS SQL Server (in my opinion, it's easier on 2005+ than with 2000).
For the simple cases, early development debugging, the sometimes cryptic messages are usually good enough to get the error resolved -- syntax error, can't do X with Y. If I'm in a tough sproc, I'll revert to "printf debugging" on the sproc text because it's quick and easy. After a while with your database of choice, the simple issues become old hat and you just take them in stride.
However, once the code is released, the complexity of the issues is way too high. I consider myself lucky if I can reproduce them. Also, the places where the developer in me would want a debugger the DBA in me says "no way you're putting a debugger there."

I do use the following tactics.
During writing of the stored procedure have a #procStep var
each time a new logical step is executed
set #procStep = "What the ... is happening here " ;
the rest is here

Related

Prevent use of pre ANSI-92 old syntax

I wonder if there's a way to prevent the creation of objects that contain old ansi sintax of join, maybe server triggers, can anyone help me?
You can create a DDL trigger and mine the eventdata() XML for the content of the proc. If you can detect the old syntax using some fancy string-parsing functions (maybe looking for commas between known table names or looking for *= or =*), then you can roll back the creation of the proc or function.
First reaction - code reviews and a decent QA process!
I've had some success looking at sys.syscomments.text. A simple where text like '%*=%' should do. Be aware that long SQL strings may be split across multiple rows. I realise this won't prevent objects getting in there in the first place. But then DDL triggers won't tell you how big your current problem is.
Although I fully understand your effort, I believe that this type of actions is the wrong way of getting where you want. First of all, you might get into serious trouble with your boss and, depending of where you work, get fired.
Second, as stated before, doing code reviews, explaining why the old syntax sucks. You have to have a decent reason why one should avoid the *= stuff. 'Because you don't like it' is not a feasible argument. In fact, there are quite some articles around showing that certain problems are just not solvable using this type of syntax.
Third, you might want to point out that separating conditions into grouping (JOIN ... ON...) and filtering conditions (WHERE...) increases the readability and might therefore be an options.
Collect your arguments and convince your colleagues rather than punishing them in quite an arrogant way.

metaprogramming with stored procedures?

This is going to be both a direct question and a discussion point. I'll ask the direct question first:
Can a stored procedure create another stored procedure dynamically? (Personally I'm interested in SQL Server 2008, but in the interests of wider discussion will leave it open)
Now to the reason I'm asking. Briefly (you can read more elsewhere), User Defined Scalar Functions in SQL Server are performance bottlenecks, at best. I've seen uses in our code base that slow the total query down by 3-4x, but from what I've read the local impact of the S-UDF can be 10x+
However, UDFs are, potentially, great for raising abstraction levels, reducing lots of tedious boilerplate, centralising logic rules etc. In most cases they boil down to simple expressions that could easily be expanded inline - but they're not (I'm really only thinking of non-querying functions - e.g. string manipluations). I've seen a bug report for this to be addressed in a future release - with some buy-in from MS. But for now we have to live with the (IMHO) broken implementation.
One workaround is to use a table value UDF instead - however these complicate the client code in ways you don't always want to deal with (esp. when the UDF just computes the result of an expression).
So my crazy idea, at first, was to write the procs with C Preprocessor directives, then pass it through a preprocessor before submitting to the RDBMS. This could work, but has its own problems.
That led me to my next crazy idea, which was to define the "macros" in the DB itself, and have a master proc that accepts a string containing an unprocessed SP with macros, expands the macros inline, then submits it on to the RDMS. This is not what SPs are good at, but I think it could work - assuming you can do this in the first place - hence my original question.
However, now I have explained my path to the question, I'd also like to leave it open for other ideas. I'm sure I'm not the only one who has been thinking along these lines. Perhaps there are third-party solutions already out there? My googling has not turned up much yet.
Also I thought it would be a fun discussion topic.
[edit]
This blog post I found in my research describes the same issue I'm seeing. I'd be happy if anyone could point out something that I, or the blog poster, might be doing wrong that leads to the overhead.
I should also add that I am using WITH SCHEMABINDING on my S-UDF, although it doesn't seem to be giving me any advantage
your string processing UDF won't be a perf problem. Scalar UDF's are a problem only when they perform selects and those selects are done for every row. this in turn spikes the IO.
string manipulaation on the other hand is done in memory and is fast.
as for your idea i can't really see any benefit of it. creating and dropping objects like that can be an expensive operation and may lead to schema locking.

What's your favored method for debugging MS SQL stored procedures?

Most of my SPs can simply be executed (and tested) with data entered manually. This works well and using simple PRINT statements allows me to "debug".
There are however cases where more than one stored procedure is involved and finding valid data to input is tedious. It's easier to just trigger things from within my web app.
I have a little experience with the profiler but I haven't found a way to explore what's going on line by line in my stored procedures.
What are your methods?
Thank you, as always.
Note: I'm assuming use of SQL Server 2005+
Profiler is very handy, just add SP:StmtStarting events, and filter the activity down to just your process by setting SPID=xxx. Once you have it set up, it's a breeze to see what's going on.
You can actually attach a debugger to your sql server :) - from vs, given you have configured that on your sql server.
Check this link for more info, notice you can set break points :) https://web.archive.org/web/20090303135325/http://dbazine.com/sql/sql-articles/cook1.
Check this link for a more general set of info: http://msdn.microsoft.com/en-us/library/zefbf0t6.aspx
Update: Regarding "There are however cases where more than one stored procedure is involved and finding valid data to input is tedious. It's easier to just trigger things from within my web app."
I suggest you set up integration tests, focused on the specific methods that interact with those procedures. If the procedures are being driven by the web app, it is a great place to have valid tests + inputs you can run at any time. If there are multiple apps that interact with the procedures, I would look at unit testing the procedures instead.
I prefer to just use stored procs for dataset retrieval, and do any complex "work" on the application side. Because you are correct, trying to "debug" what's happening inside the guts of a many layered, cursor-looping, temp-table using, nested stored proc is very difficult.
That said, MS KB 316549 describes how to use visual studio to debug stored procs.
According to this article, there are a number of limitations to debugging in this fashion:
You cannot "break" execution.
You cannot "edit and continue."
You cannot change the order of statement execution.
Although you can change the value of variables, your changes may not take effect because the variable values are cached.
Output from the SQL PRINT statement is not displayed.
Edit: Obviously, if you are the person making this stored proc, then don't make it "many layered, cursor-looping, temp-table using, and nested". In my role as a DBA, though, that's pretty much what I encounter daily from the app developers.
This trick is pretty handy:
Custom user configurable Profiler Events
As far as not knowing what the valid inputs would be, you need to test a wide range of inputs including especially invalid inputs. You should define your test cases before you write your procs. Then you have a reproducable set of tests to run every time someone changes the complex process.
My team uses SPs by rule as our interface to the database; we do it in a way that the application user can ONLY execute SPs (with our naming convention).
One best practice that we use, that works well, is that certain test scripts are contained within the SP comments, and must be executed on each rev of an SP, or development of a new SP.
You should always, ALWAYS test the SP as thoroughly as possible without any application layer involved (through Management Studio, for example).
Make sure you step into main stored proc in VS2005/2008, when it encounters a nested function, hit F11 (step into ) to enter in...continue debugging... It was not very obvious from the debug menu.
I prefer not to debug, I do test driven development instead, which almost eliminates the need to debug.

What is a "reasonable" length of time to keep a SQL cursor open?

In your applications, what's a "long time" to keep a transaction open before committing or rolling back? Minutes? Seconds? Hours?
and on which database?
I'm probably going to get flamed for this, but you really should try and avoid using cursors as they incur a serious performance hit. If you must use it, you should keep it open the absolute minimum amount of time possible so that you free up the resources being blocked by the cursor ASAP.
transactions: minutes.
Cursors: 0seconds maximum, if you use a cursor we fire you.
This is not ridiculous when you consider we are in a high availability web environment, that has to run sql server, and we don't even allow stored procs because of inability to accurately version and maintain them. If we were using oracle maybe.
#lomaxx, #ChanChan: to the best of my knowledge cursors are only a problem on SQL Server and Sybase (T-SQL variants). If your database of choice is Oracle, then cursors are your friend. I've seen a number of cases where the use of cursors has actually improved performance. Cursors are an incredibly useful mechanism and tbh, saying things like "if you use a cursor we fire you" is a little ridiculous.
Having said that, you only want to keep a cursor open for the absolute minimum that is required. Specifying a maximum time would be arbitrary and pointless without understanding the problem domain.
Generally I agree with the other answers: Avoid cursors when possible (in most cases) and close them as fast as possible.
However: It all depends on the environment you're working in.
If it is a production website environment with lots of users, make sure that the cursor goes away before someone gets a timeout.
If you're - for example - writing a "log analyzing stored procedure" (or whatever) on a proprietary machine that does nothing else: feel free to do whatever you want to do. You'll be the only person who has to wait. It's not as if the database server is going to die because you use cursors. You should consider, though, that maybe usage behaviour will change over time and at some point there might be 10 people using that application. So try to find another way ;)
#ninesided: performance issues aside, it's also about using the right tool for the job. Given the choice to move the cursor out of your query into code, I would think 99 times out of 100 it would be better to put that looping logic into some sort of managed code. Doing so allows you to get the advantages of using a debugger, compile time error checking, type saftey etc.
My answer to the question is still the same, if you're using a cursor, close it ASAP, in oracle I'd also be trying to use explicit cursors.

SQL With A Safety Net

My firm have a talented and smart operations staff who are working very hard. I'd like to give them a SQL-execution tool that helps them avoid common, easily-detected SQL mistakes that are easy to make when they are in a hurry. Can anyone suggest such a tool? Details follow.
Part of the operations team remit is writing very complex ad-hoc SQL queries. Not surprisingly, operators sometimes make mistakes in the queries they write because they are so busy.
Luckily, their queries are all SELECTs not data-changing SQL, and they are running on a copy of the database anyway. Still, we'd like to prevent errors in the SQL they run. For instance, sometimes the mistakes lead to long-running queries that slow down the duplicate system they're using and inconvenience others until we find the culprit query and kill it. Worse, occasionally the mistakes lead to apparently-correct answers that we don't catch until much later, with consequent embarrassment.
Our developers also make mistakes in complex code that they write, but they have Eclipse and various plugins (such as FindBugs) that catch errors as they type. I'd like to give operators something similar - ideally it would see
SELECT U.NAME, C.NAME FROM USER U, COMPANY C WHERE U.NAME = 'ibell';
and before you executed, it would say "Hey, did you realise that's a Cartesian product? Are you sure you want to do that?" It doesn't have to be very smart - finding obviously missing join conditions and similar evident errors would be fine.
It looks like TOAD should do this but I can't seem to find anything about such a feature. Are there other tools like TOAD that can provide this kind of semi-intelligent error correction?
Update: I forgot to mention that we're using MySQL.
If your people are using the mysql(1) program to run queries, you can use the safe-updates option (aka i-am-a-dummy) to get you part of what you need. Its name is somewhat misleading; it not only prevents UPDATE and DELETE without a WHERE (which you're not worried about), but also adds an implicit LIMIT 1000 to SELECT statements, and aborts SELECTs that have joins and are estimated to consider over 1,000,000 tuples --- perfect for discouraging Cartesian joins.
..."writing very complex ad-hoc SQL queries.... they are so busy"
Danger Will Robinson!
Automate Automate Automate.
Ideally, the ops team should not be put into a position where they have to write queries on the fly in a high stress situation – it’s a recipe for disaster! Better for them to build up a library of pre-written scripts that have undergone the appropriate testing to make sure it a) does what you want b) provides an audit trail c) has a possible ‘undo’ type function.
Failing that, giving them a user ID that only has SELECT premissions might help :-)
You might find SQL Prompt from redgate useful. I'm not sure what database engine you're using, as it's only for MSSQL Server
I'm not expecting anything like this to exist. The tool would have to first implement everything that the SQL parser in your database implements, and then it would have to do a data model analysis to predict "bad" queries.
Your best bet might be to write a plugin for a text editor that did some basic checking for suspicious patterns and highlighted them differently than the standard .sql mode. But even that would be quite difficult.
I would be happy with a tool that set off alarm bells whenever I typed in an update statement without a where clause. And perhaps administered a mild electric shock, since it's usually about 1 in the morning after a long day when mistakes like that happen.
It would be pretty easy to build this by setting up a sample database with a extremely small amount of dummy data, which would receive the query first. A couple of things will happen:
You might get a SQL syntax error, which would not load the database much since it's a small database.
You might get back a response which could clearly be shown to contain every row in one or more tables, which is probably not what they want.
Things which pass the above conditions are likely to be okay, so you can run them against the copy of the production database.
Assuming your schema doesn't change much and is not particularly weird, writing the above is likely the quickest solution to your problem.
I'd start with some coding standards - for instance never use the type of join in your example - it often results in bad results (especially in SQL Server if you try to do an outer join that way, you will get bad results). require them to do explicit joins.
If you have complex relationships, you might consider putting them in views and then writing the adhoc queries from the views. Then at least they will never make the mistake of getting the joins wrong.
Can't you just limit the amount of time a query can run for? I'm not sure about MySQL, but for SQL Server, even just the default query analyzer can restrict how long queries will run before they time out. Couple that with limited rights so they can only run SELECT queries, and you should be pretty much covered.