Confused on what to use, If Then or Case Else Statement - sql

I am trying to determine if I should use a CASE Statement or an IF THEN statement to get my results.
I want a SQL statement to run when a certain condition exists, but am not certain on how to check for the condition. Here is what I am working on
IF EXISTS(SELECT source FROM map WHERE rev_num =(SELECT MAX(rev_num) from MAP <-- at this point it would return either an A or B -->
What ever the answer is i then need to run a set of SQL's. So for A it would do this set of statements and for B it would do another.

CASE is used within a SQL statement. IF/THEN can be used to choose which query to execute.
Based on your somewhat vague example it seems like you want to execute different queries based on some condition. In that case, an IF/THEN seems more appropriate.
If, however, the majority of each query is identical and you're just changing part of the query then you may be able to use CASE to reduce the amount of duplicate code.

Related

Efficiency check in sql query

I'm struggle between 2 ways of execute my mission.
I have set of conditions, and when they are all true - I need to set "true" in an "x" column attribute of that record.
What is more efficient & recommended, and why?
set this column into "false" for all records in the table, and then
run another query to set "true" under some conditions.
set "true" if all conditions are "true", then run another query to set "false" on all records where one or more of the conditions fails.
I cannot assume that there is some default value of the "x" column need to be changed, because the query should run also one in a while, when its needed, after initial values were inserted to that column, and some conditions may be changed from the last time I ran this query.
Perhaps there is also another idea, more efficient that the 2 above?
Also I'd like to understand how to calculate efficiently of a query, something similar to the way of efficient calculation in programming.
Probably the most efficient way is to use a case Statement, since you have to evaluate the condition only once per row and also have to modify every row only once, e.g.
UPDATE tablename SET fieldname = (CASE WHEN conditions THEN 'true' ELSE 'false' END)
Case Statements are at least available in Oracle and MS-SQL. Dont know about other DB vendors.
if all conditions are similar in datastructure, I don't think it really matters.
If the conditions have different data query structures (for example, "my key isn't a foreign key in tableB" and "State = 'California'"), setting to true and then checking them one by one (as a sequence of SQL statements) is better, cause you can check the simple ones first and not bother about the complex ones for records that got their flag removed already.

LookupRecord in SQL statement with WHERE clause does not work

Background
I am creating a database to tracks lab samples. I wish to put a restriction in place that prevents a technician from reporting multiple results for the same sample.
Current strategy
My query called qselReport lists all samples being reported.
SELECT tblResult.strSampleName, tblResult.ysnReport FROM tblResult WHERE (((tblResult.ysnReport)=True));
When a technician wishes to report a result for a given sample, I use a Before Change Event to check for that sample in qselReport (the code block below is my event macro N.B. it is not VBA).
If Updated("ysnReport") And Old.[ysnReport]=False Then
Look Up A Record In qselReport
Where Condition = [strSampleName]=[tblResult].[strSampleName]
Alias
RaiseError
Error Number 1
Error Description This sample is already being reported.
End If
That all works fine and dandy. The error message pops up if a second result is selected to report for a sample.
The problem
I like to keep things as sleek as possible, so I don't want qselReport unless it's absolutely necessary. So I made a slight adjustment to the LookupRecord block so that it operates on a SQL statement rather than on the query. Here's what that looks like (again N.B. not VBA, just a macro):
If Updated("ysnReport") And Old.[ysnReport]=False Then
Look Up A Record In SELECT tblResult.strSampleName, tblResult.ysnReport FROM tblResult WHERE [tblResult].[ysnReport]=True;
Where Condition = [strSampleName]=[tblResult].[strSampleName]
Alias
RaiseError
Error Number 1
Error Description This sample is already being reported.
End If
Now I get the error message every time that a result is reported, even if it's the first instance for that sample. From what I can tell, the issue is that the SQL statement WHERE clause does not filter the records to only those where ysnReport=True.
Can anyone explain why I can do LookupRecord in a query but not LookupRecord in an identical SQL statement? Thanks for the input.
If you want things as sleek as possible, at least performance-wise, a stored query should, in principle, outperform dynamic SQL.
Syntax-wise, I'm not familiar with the macro constructs, but I'd consider enclosing the select statement in parentheses if it accepts them and also adding an explicit alias. I suspect that alias would in turn need to be referenced in your WHERE condition:
Where Condition = MySelect.[strSampleName]=[tblResult].[strSampleName]
Alias MySelect
I found the solution to my problem. The SQL statement where clause needed to be moved to the LookupRecord data block Where condition. This works:
If Updated("ysnReport") And Old.[ysnReport]=False Then
Look Up A Record In SELECT tblResult.strSampleName, tblResult.ysnReport FROM tblResult;
Where Condition = [ysnReport]=True And [strSampleName]=[tblResult].[strSampleName]
Alias
RaiseError
Error Number 1
Error Description This sample is already being reported.
End If

why would you use WHERE 1=0 statement in SQL?

I saw a query run in a log file on an application. and it contained a query like:
SELECT ID FROM CUST_ATTR49 WHERE 1=0
what is the use of such a query that is bound to return nothing?
A query like this can be used to ping the database. The clause:
WHERE 1=0
Ensures that non data is sent back, so no CPU charge, no Network traffic or other resource consumption.
A query like that can test for:
server availability
CUST_ATTR49 table existence
ID column existence
Keeping a connection alive
Cause a trigger to fire without changing any rows (with the where clause, but not in a select query)
manage many OR conditions in dynamic queries (e.g WHERE 1=0 OR <condition>)
This may be also used to extract the table schema from a table without extracting any data inside that table. As Andrea Colleoni said those will be the other benefits of using this.
A usecase I can think of: you have a filter form where you don't want to have any search results. If you specify some filter, they get added to the where clause.
Or it's usually used if you have to create a sql query by hand. E.g. you don't want to check whether the where clause is empty or not..and you can just add stuff like this:
where := "WHERE 0=1"
if X then where := where + " OR ... "
if Y then where := where + " OR ... "
(if you connect the clauses with OR you need 0=1, if you have AND you have 1=1)
As an answer - but also as further clarification to what #AndreaColleoni already mentioned:
manage many OR conditions in dynamic queries (e.g WHERE 1=0 OR <condition>)
Purpose as an on/off switch
I am using this as a switch (on/off) statement for portions of my Query.
If I were to use
WHERE 1=1
AND (0=? OR first_name = ?)
AND (0=? OR last_name = ?)
Then I can use the first bind variable (?) to turn on or off the first_name search criterium. , and the third bind variable (?) to turn on or off the last_name criterium.
I have also added a literal 1=1 just for esthetics so the text of the query aligns nicely.
For just those two criteria, it does not appear that helpful, as one might thing it is just easier to do the same by dynamically building your WHERE condition by either putting only first_name or last_name, or both, or none. So your code will have to dynamically build 4 versions of the same query. Imagine what would happen if you have 10 different criteria to consider, then how many combinations of the same query will you have to manage then?
Compile Time Optimization
I also might add that adding in the 0=? as a bind variable switch will not work very well if all your criteria are indexed. The run time optimizer that will select appropriate indexes and execution plans, might just not see the cost benefit of using the index in those slightly more complex predicates. Hence I usally advice, to inject the 0 / 1 explicitly into your query (string concatenating it in in your sql, or doing some search/replace). Doing so will give the compiler the chance to optimize out redundant statements, and give the Runtime Executer a much simpler query to look at.
(0=1 OR cond = ?) --> (cond = ?)
(0=0 OR cond = ?) --> Always True (ignore predicate)
In the second statement above the compiler knows that it never has to even consider the second part of the condition (cond = ?), and it will simply remove the entire predicate. If it were a bind variable, the compiler could never have accomplished this.
Because you are simply, and forcedly, injecting a 0/1, there is zero chance of SQL injections.
In my SQL's, as one approach, I typically place my sql injection points as ${literal_name}, and I then simply search/replace using a regex any ${...} occurrence with the appropriate literal, before I even let the compiler have a stab at it. This basically leads to a query stored as follows:
WHERE 1=1
AND (0=${cond1_enabled} OR cond1 = ?)
AND (0=${cond2_enabled} OR cond2 = ?)
Looks good, easily understood, the compiler handles it well, and the Runtime Cost Based Optimizer understands it better and will have a higher likelihood of selecting the right index.
I take special care in what I inject. Prime way for passing variables is and remains bind variables for all the obvious reasons.
This is very good in metadata fetching and makes thing generic.
Many DBs have optimizer so they will not actually execute it but its still a valid SQL statement and should execute on all DBs.
This will not fetch any result, but you know column names are valid, data types etc. If it does not execute you know something is wrong with DB(not up etc.)
So many generic programs execute this dummy statement for testing and fetching metadata.
Some systems use scripts and can dynamically set selected records to be hidden from a full list; so a false condition needs to be passed to the SQL. For example, three records out of 500 may be marked as Privacy for medical reasons and should not be visible to everyone. A dynamic query will control the 500 records are visible to those in HR, while 497 are visible to managers. A value would be passed to the SQL clause that is conditionally set, i.e. ' WHERE 1=1 ' or ' WHERE 1=0 ', depending who is logged into the system.
quoted from Greg
If the list of conditions is not known at compile time and is instead
built at run time, you don't have to worry about whether you have one
or more than one condition. You can generate them all like:
and
and concatenate them all together. With the 1=1 at the start, the
initial and has something to associate with.
I've never seen this used for any kind of injection protection, as you
say it doesn't seem like it would help much. I have seen it used as an
implementation convenience. The SQL query engine will end up ignoring
the 1=1 so it should have no performance impact.
Why would someone use WHERE 1=1 AND <conditions> in a SQL clause?
If the user intends to only append records, then the fastest method is open the recordset without returning any existing records.
It can be useful when only table metadata is desired in an application. For example, if you are writing a JDBC application and want to get the column display size of columns in the table.
Pasting a code snippet here
String query = "SELECT * from <Table_name> where 1=0";
PreparedStatement stmt = connection.prepareStatement(query);
ResultSet rs = stmt.executeQuery();
ResultSetMetaData rsMD = rs.getMetaData();
int columnCount = rsMD.getColumnCount();
for(int i=0;i<columnCount;i++) {
System.out.println("Column display size is: " + rsMD.getColumnDisplaySize(i+1));
}
Here having a query like "select * from table" can cause performance issues if you are dealing with huge data because it will try to fetch all the records from the table. Instead if you provide a query like "select * from table where 1=0" then it will fetch only table metadata and not the records so it will be efficient.
Per user milso in another thread, another purpose for "WHERE 1=0":
CREATE TABLE New_table_name as select * FROM Old_table_name WHERE 1 =
2;
this will create a new table with same schema as old table. (Very
handy if you want to load some data for compares)
An example of using a where condition of 1=0 is found in the Northwind 2007 database. On the main page the New Customer Order and New Purchase Order command buttons use embedded macros with the Where Condition set to 1=0. This opens the form with a filter that forces the sub-form to display only records related to the parent form. This can be verified by opening either of those forms from the tree without using the macro. When opened this way all records are displayed by the sub-form.
In ActiveRecord ORM, part of RubyOnRails:
Post.where(category_id: []).to_sql
# => SELECT * FROM posts WHERE 1=0
This is presumably because the following is invalid (at least in Postgres):
select id FROM bookings WHERE office_id IN ()
It seems like, that someone is trying to hack your database. It looks like someone tried mysql injection. You can read more about it here: Mysql Injection

Can scalar functions be applied before filtering when executing a SQL Statement?

I suppose I have always naively assumed that scalar functions in the select part of a SQL query will only get applied to the rows that meet all the criteria of the where clause.
Today I was debugging some code from a vendor and had that assumption challenged. The only reason I can think of for this code failing is that the Substring() function is getting called on data that should have been filtered out by the WHERE clause. But it appears that the substring call is being applied before the filtering happens, the query is failing.
Here is an example of what I mean. Let's say we have two tables, each with 2 columns and having 2 rows and 1 row respectively. The first column in each is just an id. NAME is just a string, and NAME_LENGTH tells us how many characters in the name with the same ID. Note that only names with more than one character have a corresponding row in the LONG_NAMES table.
NAMES: ID, NAME
1, "Peter"
2, "X"
LONG_NAMES: ID, NAME_LENGTH
1, 5
If I want a query to print each name with the last 3 letters cut off, I might first try something like this (assuming SQL Server syntax for now):
SELECT substring(NAME,1,len(NAME)-3)
FROM NAMES;
I would soon find out that this would give me an error, because when it reaches "X" it will try using a negative number for in the substring call, and it will fail.
The way my vendor decided to solve this was by filtering out rows where the strings were too short for the len - 3 query to work. He did it by joining to another table:
SELECT substring(NAMES.NAME,1,len(NAMES.NAME)-3)
FROM NAMES
INNER JOIN LONG_NAMES
ON NAMES.ID = LONG_NAMES.ID;
At first glance, this query looks like it might work. The join condition will eliminate any rows that have NAME fields short enough for the substring call to fail.
However, from what I can observe, SQL Server will sometimes try to calculate the the substring expression for everything in the table, and then apply the join to filter out rows. Is this supposed to happen this way? Is there a documented order of operations where I can find out when certain things will happen? Is it specific to a particular Database engine or part of the SQL standard? If I decided to include some predicate on my NAMES table to filter out short names, (like len(NAME) > 3), could SQL Server also choose to apply that after trying to apply the substring? If so then it seems the only safe way to do a substring would be to wrap it in a "case when" construct in the select?
Martin gave this link that pretty much explains what is going on - the query optimizer has free rein to reorder things however it likes. I am including this as an answer so I can accept something. Martin, if you create an answer with your link in it i will gladly accept that instead of this one.
I do want to leave my question here because I think it is a tricky one to search for, and my particular phrasing of the issue may be easier for someone else to find in the future.
TSQL divide by zero encountered despite no columns containing 0
EDIT: As more responses have come in, I am again confused. It does not seem clear yet when exactly the optimizer is allowed to evaluate things in the select clause. I guess I'll have to go find the SQL standard myself and see if i can make sense of it.
Joe Celko, who helped write early SQL standards, has posted something similar to this several times in various USENET newsfroups. (I'm skipping over the clauses that don't apply to your SELECT statement.) He usually said something like "This is how statements are supposed to act like they work". In other words, SQL implementations should behave exactly as if they did these steps, without actually being required to do each of these steps.
Build a working table from all of
the table constructors in the FROM
clause.
Remove from the working table those
rows that do not satisfy the WHERE
clause.
Construct the expressions in the
SELECT clause against the working table.
So, following this, no SQL dbms should act like it evaluates functions in the SELECT clause before it acts like it applies the WHERE clause.
In a recent posting, Joe expands the steps to include CTEs.
CJ Date and Hugh Darwen say essentially the same thing in chapter 11 ("Table Expressions") of their book A Guide to the SQL Standard. They also note that this chapter corresponds to the "Query Specification" section (sections?) in the SQL standards.
You are thinking about something called query execution plan. It's based on query optimization rules, indexes, temporaty buffers and execution time statistics. If you are using SQL Managment Studio you have toolbox over your query editor where you can look at estimated execution plan, it shows how your query will change to gain some speed. So if just used your Name table and it is in buffer, engine might first try to subquery your data, and then join it with other table.

How to cope with null results in SQL Tasks that return single rows in SSIS 2005?

In a dataflow task, I can slip a rowcount into the processing flow and place the count into a variable. I can later use that variable to conditionally perform some other work if the rowcount was > 0. This works well for me, but I have no corresponding strategy for sql tasks expected to return a single row. In that event, I'm returning those values into variables. If the lookup produces no rows, the sql task fails when assigning values into those variables. I can branch on that component failing, but there's a side effect of that - if I'm running the job as a SQL server agent job step, the step returns DTSER_FAILURE, causing the step to fail. I can tell the sql agent to disregard the step failure, but then I won't know if I have a legitimate error in that step. This seems harder than it should be.
The only strategy I can think of is to run the same query with a count(*) aggregate and test if that returns a number > 0 and if so running the query again without the count. That's ugly because I have the same query in two places that I need to keep in sync.
Is there a better way?
In that same condition you can have additional logic (&& or ||). I would take one of the variables for your single statement and say something to the effect:
If #User::rowcount>0 || #User:single_record_var!=Default
That should help.
What kind of SQL statement? Can you change it to still return a single row with all NULLs instead of no rows?
What stops it from returning more than one row? The package would fail if it ended up returning more than one row, right?
You could also change it to call a stored procedure, and then call the stored procedure in two places without code duplication. You could also change it to be a view or user-defined function (if parameters are needed), SELECT COUNT(*) FROM udf() to check if there is data, SELECT * FROM udf() to get the row.