Prevent use of pre ANSI-92 old syntax

Prevent use of pre ANSI-92 old syntax - sql

I wonder if there's a way to prevent the creation of objects that contain old ansi sintax of join, maybe server triggers, can anyone help me?

You can create a DDL trigger and mine the eventdata() XML for the content of the proc. If you can detect the old syntax using some fancy string-parsing functions (maybe looking for commas between known table names or looking for *= or =*), then you can roll back the creation of the proc or function.

First reaction - code reviews and a decent QA process!
I've had some success looking at sys.syscomments.text. A simple where text like '%*=%' should do. Be aware that long SQL strings may be split across multiple rows. I realise this won't prevent objects getting in there in the first place. But then DDL triggers won't tell you how big your current problem is.

Although I fully understand your effort, I believe that this type of actions is the wrong way of getting where you want. First of all, you might get into serious trouble with your boss and, depending of where you work, get fired.
Second, as stated before, doing code reviews, explaining why the old syntax sucks. You have to have a decent reason why one should avoid the *= stuff. 'Because you don't like it' is not a feasible argument. In fact, there are quite some articles around showing that certain problems are just not solvable using this type of syntax.
Third, you might want to point out that separating conditions into grouping (JOIN ... ON...) and filtering conditions (WHERE...) increases the readability and might therefore be an options.
Collect your arguments and convince your colleagues rather than punishing them in quite an arrogant way.

Related

Should triggers only be used for something that's not possible with SQL?

I'm currently writing an application that uses SQLite. SQLite doesn't have the ON UPDATE function for timestamps so I wrote my first trigger for it. Then I wrote a trigger to add the current timestamp on the modified and created fields on an insert.
The problem came when I went and deleted the setting of the modified/created fields for the insert. I felt like I was hiding something from developers that might see my code in the future. It could be a source of confusion.
How will they know that the sql is coming from a trigger. Should I comment it? Is it bad practice?

As a rule of thumb, triggers are meant to implement SQL functional rules, such as inclusions, exclusions, partitions etc.
This kind of thing belongs to the model and should be implemented as triggers whenever it is possible. It has to be delivered with the database otherwise the model would be broken.
Regarding to your case, it is more a hack than anything. If you can't do differently, do it and then add a comment like you said. But it should remain an exception.
Keep in mind that almost everything a trigger is doing could be done at the application layer (whichever you want)

Good observation. There are some things only triggers can do. However I suggest that if there is any alternative to using a trigger then use the alternative. I'm not familiar with SQLite, but in any other database I would use a DEFAULT rather than a trigger to timestamp a new record. To capture updated date I would enclose this in a stored procedure, or whatever database side logic you have (kind of what RandomUs1r suggested). I might consider a trigger but only for very basic operations.
You are correct that triggers can be confusing and difficult to debug.

"I felt like I was hiding something from developers..." - this is a very good point. I've came across many developers who use ##Identity and were genuinely shocked that if somebody put a trigger on the table which inserted another row, they'd end up with the wrong identity. (As opposed to SCOPE_IDENTITY() - I know these are sql server specific, but that's pretty much all I know...)
It is hidden - other than documentation I'm not sure you can make it more visible either.
Which is why many avoid them where possible - I guess if there's no easy way around using them in some cases then as long as its well documented, etc. I think like cursors, although scorned by many they can be very powerful and useful...but if they can be avoided probably for the best.

On the code that modifies the record, to get the current timestamp... in SQLLite, try:
DATETIME('NOW')

How do you think while formulating Sql Queries. Is it an experience or a concept?

I have been working on sql server and front end coding and have usually faced problem formulating queries.
I do understand most of the concepts of sql that are needed in formulating queries but whenever some new functionality comes into the picture that can be dont using sql query, i do usually fails resolving them.
I am very comfortable with select queries using joins and all such things but when it comes to DML operation i usually fails
For every query that i never done before I usually finds uncomfortable with that while creating them. Whenever I goes for an interview I usually faces this problem.
Is it their some concept behind approaching on formulating sql queries.
Eg.
I need to create an sql query such that
A table contain single column having duplicate record. I need to remove duplicate records.
I know i can find the solution to this query very easily on Googling, but I want to know how everyone comes to the desired result.
Is it something like Practice Makes Man Perfect i.e. once you did it, next time you will be able to formulate or their is some logic or concept behind.
I could have get my answer of solving above problem simply by posting it on stackoverflow and i would have been with an answer within 5 to 10 minutes but I want to know the reason. How do you work on any new kind of query. Is it a major contribution of experience or some an implementation of concepts.
Whenever I learns some new thing in coding section I tries to utilize it wherever I can use it. But here scenario seems to be changed because might be i am lagging in some concepts.
EDIT
How could I test my knowledge and
concepts in Sql and related sql
queries ?

Typically, the first time you need to open a child proof bottle of pills, you have a hard time, but after that you are prepared for what it might/will entail.
So it is with programming (me thinks).
You find problems, research best practices, and beat your head against a couple of rocks, but in the process you will come to have a handy set of tools.
Also, reading what others tried/did, is a good way to avoid major obsticles.
All in all, with a lot of practice/coding, you will see patterns quicker, and learn to notice where to make use of what tool.

I have a somewhat methodical method of constructing queries in general, and it is something I use elsewhere with any problem solving I need to do.
The first step is ALWAYS listing out any bits of information I have in a request. Information is essentially anything that tells me something about something.
A table contain single column having
duplicate record. I need to remove
duplicate
I have a table (I'll call it table1)
I have a
column on table table1 (I'll call it col1)
I have
duplicates in col1 on table table1
I need to remove
duplicates.
The next step of my query construction is identifying the action I'll take from the information I have.
I'll look for certain keywords (e.g. remove, create, edit, show, etc...) along with the standard insert, update, delete to determine the action.
In the example this would be DELETE because of remove.
The next step is isolation.
Asnwer the question "the action determined above should only be valid for ______..?" This part is almost always the most difficult part of constructing any query because it's usually abstract.
In the above example you're listing "duplicate records" as a piece of information, but that's really an abstract concept of something (anything where a specific value is not unique in usage).
Isolation is also where I test my action using a SELECT statement.
Every new query I run gets thrown through a select first!
The next step is execution, or essentially the "how do I get this done" part of a request.
A lot of times you'll figure the how out during the isolation step, but in some instances (yours included) how you isolate something, and how you fix it is not the same thing.
Showing duplicated values is different than removing a specific duplicate.
The last step is implementation. This is just where I take everything and make the query...
Summing it all up... for me to construct a query I'll pick out all information that I have in the request. Using the information I'll figure out what I need to do (the action), and what I need to do it on (isolation). Once I know what I need to do with what I figure out the execution.
Every single time I'm starting a new "query" I'll run it through these general steps to get an idea for what I'm going to do at an abstract level.
For specific implementations of an actual request you'll have to have some knowledge (or access to google) to go further than this.
Kris

I think in the same way I cook dinner. I have some ingredients (tables, columns etc.), some cooking methods (SELECT, UPDATE, INSERT, GROUP BY etc.) then I put them together in the way I know how.
Sometimes I will do something weird and find it tastes horrible, or that it is amazing.
Occasionally I will pick up new recipes from the internet or friends, then use parts of these in my own.
I also save my recipes in handy repositories, broken down into reusable chunks.

On the "Delete a duplicate" example, I'd come to the result by googling it. This scenario is so rare if the DB is designed properly that I wouldn't bother keeping this information in my head. Why bother, when there is a good resource is available for me to look it up when I need it?
For other queries, it really is practice makes perfect.
Over time, you get to remember frequently used patterns just because they ARE frequently used. Rare cases should be kept in a reference material. I've simply got too much other stuff to remember.

Find a good documentation to your software. I am using Mysql a lot and Mysql has excellent documentation site with decent search function so you get many answers just by reading docs. If you do NOT get your answer at least you are learning something.
Than I set up an example database (or use the one I am working on) and gradually build my SQL. I tend to separate the problem into small pieces and solve it step by step - this is very successful if you are building queries including many JOINS - it is best to start with some particular case and "polute" your SQL with many conditions like WHEN id = "123" which you are taking out as you are working towards your solution.
The best and fastest way to learn good SQL is to work with someone else, preferably someone who knows more than you, but it is not necessarry condition. It can be replaced by studying mature code written by others.

Your example is a test of how well you understand the DISTINCT keyword and the GROUP BY clause, which are SQL's ways of dealing with duplicate data.

Examples and experience. You look at other peoples examples and you create your own code and once it groks, you don't need to think about it again.

I would have a look at the Mere Mortals book - I think it's the one by Hernandez. I remember that when I first started seriously with SQL Server 6.5, moving from manual ISAM databases and Access database systems using VB4, that it was difficult to understand the syntax, the joins and the declarative style. And the SQL queries, while powerful, were very intimidating to understand - because typically, I was looking at generated code in Microsoft Access.
However, once I had developed a relatively systematic approach to building queries in a consistent and straightforward fashion, my skills and confidence quickly moved forward.

From seeing your responses you have two options.
Have a copy of the specification for whatever your working on (SQL spec and the documentation for the SQL implementation (SQLite, SQL Server etc..)
Use Google, SO, Books, etc.. as a resource to find answers.
You can't formulate an answer to a problem without doing one of the above. The first option is to become well versed into the capabilities of whatever you are working on.
The second option allows you to find answers that you may not even fully know how to ask. You example is fairly simplistic, so if you read the spec/implementation documentaion you would know the answer right away. But there are times, where even if you read the spec/documentation you don't know the answer. You only know that it IS possible, just not how to do it.
Remember that as far as jobs and supervisors go, being able to resolve a problem is important, but the faster you can do it the better which can often be done with option 2.

metaprogramming with stored procedures?

This is going to be both a direct question and a discussion point. I'll ask the direct question first:
Can a stored procedure create another stored procedure dynamically? (Personally I'm interested in SQL Server 2008, but in the interests of wider discussion will leave it open)
Now to the reason I'm asking. Briefly (you can read more elsewhere), User Defined Scalar Functions in SQL Server are performance bottlenecks, at best. I've seen uses in our code base that slow the total query down by 3-4x, but from what I've read the local impact of the S-UDF can be 10x+
However, UDFs are, potentially, great for raising abstraction levels, reducing lots of tedious boilerplate, centralising logic rules etc. In most cases they boil down to simple expressions that could easily be expanded inline - but they're not (I'm really only thinking of non-querying functions - e.g. string manipluations). I've seen a bug report for this to be addressed in a future release - with some buy-in from MS. But for now we have to live with the (IMHO) broken implementation.
One workaround is to use a table value UDF instead - however these complicate the client code in ways you don't always want to deal with (esp. when the UDF just computes the result of an expression).
So my crazy idea, at first, was to write the procs with C Preprocessor directives, then pass it through a preprocessor before submitting to the RDBMS. This could work, but has its own problems.
That led me to my next crazy idea, which was to define the "macros" in the DB itself, and have a master proc that accepts a string containing an unprocessed SP with macros, expands the macros inline, then submits it on to the RDMS. This is not what SPs are good at, but I think it could work - assuming you can do this in the first place - hence my original question.
However, now I have explained my path to the question, I'd also like to leave it open for other ideas. I'm sure I'm not the only one who has been thinking along these lines. Perhaps there are third-party solutions already out there? My googling has not turned up much yet.
Also I thought it would be a fun discussion topic.
[edit]
This blog post I found in my research describes the same issue I'm seeing. I'd be happy if anyone could point out something that I, or the blog poster, might be doing wrong that leads to the overhead.
I should also add that I am using WITH SCHEMABINDING on my S-UDF, although it doesn't seem to be giving me any advantage

your string processing UDF won't be a perf problem. Scalar UDF's are a problem only when they perform selects and those selects are done for every row. this in turn spikes the IO.
string manipulaation on the other hand is done in memory and is fast.
as for your idea i can't really see any benefit of it. creating and dropping objects like that can be an expensive operation and may lead to schema locking.

Is there a better way to debug SQL?

I have worked with SQL for several years now, primarily MySQL/PhpMyAdmin, but also Oracle/iSqlPlus and PL/SQL lately. I have programmed in PHP, Java, ActionScript and more. I realise SQL isn't an imperative programming language like the others - but why do the error messages seem so much less specific in SQL? In other environments I'm pointed straight to the root of the problem. More often that not, MySQL gives me errors like "error AROUND where u.id = ..." and prints the whole query. This is even more difficult with stored procedures, where debugging can be a complete nightmare.
Am I missing a magic tool/language/plugin/setting that gives better error reporting or are we stuck with this? I want a debugger or language which gives me the same amount of control that Eclipse gives me when setting breakpoints and stepping trough the code. Is this possible?

I think the answer lies in the fact that SQL is a set-based language with a few procedural things attached. Since the designers were thinking in set-based terms, they didn't think that the ordinary type of debugging that other languages have is important. However, I think some of this is changing. You can set breakpoints in SQL Server 2008. I haven't used it really as you must have SQL Server 2008 databases before it will work and most of ours are still SQL Server 2000. But it is available and it does allow you to step through things. You still are going to have problems when your select statement is 150 lines long and it knows that the syntax isn't right but it can't point out exactly where as it is all one command.
Personally when I am writing a long procedural SP, I build in a test mode that includes showing me the results of things I do, the values of key variables at specific points I'm interested in, and print staments that let me know what steps have been completed and then rolling the whole thing back when done. That way I can see what would have happened if it had run for real, but not have hurt any of the data in the database if I got it wrong. I find this very useful. It can vastly increase the size of your proc though. I have a template I use that has most of the structure I need set up in it, so it doesn't really take me too long to do. Especially since I never add an insert. update or delete to a proc without first testing the associated select to ensure I have the records I want.

I think the explanation is that "regular" languages have much smaller individual statements than SQL, so that single-statement granularity points to a much smaller part of the code in them than in SQL. A single SQL statement can be a page or more in length; in other languages it's usually a single line.
I don't think that makes it impossible for debuggers / IDEs to more precisely identify errors, but I suspect it makes it harder.

I agree with your complaint.
Building a good logging framework and overusing it in your sprocs is what works best for me.
Before and after every transaction or important piece of logic, I write out the sproc name, step timestamp and a rowcount (if relevant) to my log table. I find that when I have done this, I can usually narrow down the problem spot within a few minutes.
Add a debug parameter to the sproc (default to "N") and pass it through to any other sprocs that it calls so that you can easily turn logging on or off.

As for breakpoints and stepping through code, you can do this with MS SQL Server (in my opinion, it's easier on 2005+ than with 2000).
For the simple cases, early development debugging, the sometimes cryptic messages are usually good enough to get the error resolved -- syntax error, can't do X with Y. If I'm in a tough sproc, I'll revert to "printf debugging" on the sproc text because it's quick and easy. After a while with your database of choice, the simple issues become old hat and you just take them in stride.
However, once the code is released, the complexity of the issues is way too high. I consider myself lucky if I can reproduce them. Also, the places where the developer in me would want a debugger the DBA in me says "no way you're putting a debugger there."

I do use the following tactics.
During writing of the stored procedure have a #procStep var
each time a new logical step is executed
set #procStep = "What the ... is happening here " ;
the rest is here

SQL With A Safety Net

My firm have a talented and smart operations staff who are working very hard. I'd like to give them a SQL-execution tool that helps them avoid common, easily-detected SQL mistakes that are easy to make when they are in a hurry. Can anyone suggest such a tool? Details follow.
Part of the operations team remit is writing very complex ad-hoc SQL queries. Not surprisingly, operators sometimes make mistakes in the queries they write because they are so busy.
Luckily, their queries are all SELECTs not data-changing SQL, and they are running on a copy of the database anyway. Still, we'd like to prevent errors in the SQL they run. For instance, sometimes the mistakes lead to long-running queries that slow down the duplicate system they're using and inconvenience others until we find the culprit query and kill it. Worse, occasionally the mistakes lead to apparently-correct answers that we don't catch until much later, with consequent embarrassment.
Our developers also make mistakes in complex code that they write, but they have Eclipse and various plugins (such as FindBugs) that catch errors as they type. I'd like to give operators something similar - ideally it would see
SELECT U.NAME, C.NAME FROM USER U, COMPANY C WHERE U.NAME = 'ibell';
and before you executed, it would say "Hey, did you realise that's a Cartesian product? Are you sure you want to do that?" It doesn't have to be very smart - finding obviously missing join conditions and similar evident errors would be fine.
It looks like TOAD should do this but I can't seem to find anything about such a feature. Are there other tools like TOAD that can provide this kind of semi-intelligent error correction?
Update: I forgot to mention that we're using MySQL.

If your people are using the mysql(1) program to run queries, you can use the safe-updates option (aka i-am-a-dummy) to get you part of what you need. Its name is somewhat misleading; it not only prevents UPDATE and DELETE without a WHERE (which you're not worried about), but also adds an implicit LIMIT 1000 to SELECT statements, and aborts SELECTs that have joins and are estimated to consider over 1,000,000 tuples --- perfect for discouraging Cartesian joins.

..."writing very complex ad-hoc SQL queries.... they are so busy"
Danger Will Robinson!
Automate Automate Automate.
Ideally, the ops team should not be put into a position where they have to write queries on the fly in a high stress situation – it’s a recipe for disaster! Better for them to build up a library of pre-written scripts that have undergone the appropriate testing to make sure it a) does what you want b) provides an audit trail c) has a possible ‘undo’ type function.
Failing that, giving them a user ID that only has SELECT premissions might help :-)

You might find SQL Prompt from redgate useful. I'm not sure what database engine you're using, as it's only for MSSQL Server

I'm not expecting anything like this to exist. The tool would have to first implement everything that the SQL parser in your database implements, and then it would have to do a data model analysis to predict "bad" queries.
Your best bet might be to write a plugin for a text editor that did some basic checking for suspicious patterns and highlighted them differently than the standard .sql mode. But even that would be quite difficult.
I would be happy with a tool that set off alarm bells whenever I typed in an update statement without a where clause. And perhaps administered a mild electric shock, since it's usually about 1 in the morning after a long day when mistakes like that happen.

It would be pretty easy to build this by setting up a sample database with a extremely small amount of dummy data, which would receive the query first. A couple of things will happen:
You might get a SQL syntax error, which would not load the database much since it's a small database.
You might get back a response which could clearly be shown to contain every row in one or more tables, which is probably not what they want.
Things which pass the above conditions are likely to be okay, so you can run them against the copy of the production database.
Assuming your schema doesn't change much and is not particularly weird, writing the above is likely the quickest solution to your problem.

I'd start with some coding standards - for instance never use the type of join in your example - it often results in bad results (especially in SQL Server if you try to do an outer join that way, you will get bad results). require them to do explicit joins.
If you have complex relationships, you might consider putting them in views and then writing the adhoc queries from the views. Then at least they will never make the mistake of getting the joins wrong.

Can't you just limit the amount of time a query can run for? I'm not sure about MySQL, but for SQL Server, even just the default query analyzer can restrict how long queries will run before they time out. Couple that with limited rights so they can only run SELECT queries, and you should be pretty much covered.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas