SQLServer - MERGE with condition is slow, even an "always false" condition - sql

I want to execute the MERGE statement conditionally, so it won't try to match the entire target-table.
My original statement was kinda like this:
MERGE [target_table] USING [table_source]
ON (([target_table].[ID] = [table_source].[ID]) AND (condition))
WHEN MATCHED THEN UPDATE
SET [table_source].[_strField1] = [table_source].[_strField2];
Note: assume '_strField' to be typed as nvarchar(4000), and 'condition' to be something like [target_table].[_strField8] = 'sometext'.
But then I've encountered the following warning in the documentation that dictates "...Do not attempt to improve query performance by filtering out rows in the target table in the ON clause".
So my original query was altered to the following one
MERGE [target_table] USING [table_source]
ON (([target_table].[ID] = [table_source].[ID]))
WHEN MATCHED AND (condition)
THEN UPDATE
SET [table_source].[_strField] = [table_source].[_strField];
The problem is, that the query now takes a lot more time. Even changing the condition to be "always false", such as 1 = 2 doesn't help at all. On the other hand, setting different fields, such as
SET [table_source].[_intField] = [table_source].[_intField];
or any other types other than two nvarchar(4000)s causes the statement to be executed much faster.
To conclude, the things I don't understand are:
If the data-setting of nvarchar(4000) is the longer process, why setting the condition to be "1 = 2" doesn't speed up the execution time?
If the "row-matching" is the longer process, why setting INT fields does speed up the execution time?

According to sql server documentation:
"...Do not attempt to improve query performance by filtering out rows in the target table in the ON clause, such as by specifying AND NOT target_table.column_x = value. Doing so may return unexpected and incorrect results..*
your first query is invalid. You should not take time results into consideration.
If you update column with fixed lenght (int,date) there will be no "Row-Overflow Data Exceeding 8 KB" situation. This situation can occurs when you use nvarchar(4000) and give poor query performance.
We do not now how the merge function works inside and how it process data. So propably only developers of this function can give You answers to your question.
I hope that I have help you with varchar(4000) performance issue.
Marcin Pazgier

Related

Confused on what to use, If Then or Case Else Statement

I am trying to determine if I should use a CASE Statement or an IF THEN statement to get my results.
I want a SQL statement to run when a certain condition exists, but am not certain on how to check for the condition. Here is what I am working on
IF EXISTS(SELECT source FROM map WHERE rev_num =(SELECT MAX(rev_num) from MAP <-- at this point it would return either an A or B -->
What ever the answer is i then need to run a set of SQL's. So for A it would do this set of statements and for B it would do another.
CASE is used within a SQL statement. IF/THEN can be used to choose which query to execute.
Based on your somewhat vague example it seems like you want to execute different queries based on some condition. In that case, an IF/THEN seems more appropriate.
If, however, the majority of each query is identical and you're just changing part of the query then you may be able to use CASE to reduce the amount of duplicate code.

why would you use WHERE 1=0 statement in SQL?

I saw a query run in a log file on an application. and it contained a query like:
SELECT ID FROM CUST_ATTR49 WHERE 1=0
what is the use of such a query that is bound to return nothing?
A query like this can be used to ping the database. The clause:
WHERE 1=0
Ensures that non data is sent back, so no CPU charge, no Network traffic or other resource consumption.
A query like that can test for:
server availability
CUST_ATTR49 table existence
ID column existence
Keeping a connection alive
Cause a trigger to fire without changing any rows (with the where clause, but not in a select query)
manage many OR conditions in dynamic queries (e.g WHERE 1=0 OR <condition>)
This may be also used to extract the table schema from a table without extracting any data inside that table. As Andrea Colleoni said those will be the other benefits of using this.
A usecase I can think of: you have a filter form where you don't want to have any search results. If you specify some filter, they get added to the where clause.
Or it's usually used if you have to create a sql query by hand. E.g. you don't want to check whether the where clause is empty or not..and you can just add stuff like this:
where := "WHERE 0=1"
if X then where := where + " OR ... "
if Y then where := where + " OR ... "
(if you connect the clauses with OR you need 0=1, if you have AND you have 1=1)
As an answer - but also as further clarification to what #AndreaColleoni already mentioned:
manage many OR conditions in dynamic queries (e.g WHERE 1=0 OR <condition>)
Purpose as an on/off switch
I am using this as a switch (on/off) statement for portions of my Query.
If I were to use
WHERE 1=1
AND (0=? OR first_name = ?)
AND (0=? OR last_name = ?)
Then I can use the first bind variable (?) to turn on or off the first_name search criterium. , and the third bind variable (?) to turn on or off the last_name criterium.
I have also added a literal 1=1 just for esthetics so the text of the query aligns nicely.
For just those two criteria, it does not appear that helpful, as one might thing it is just easier to do the same by dynamically building your WHERE condition by either putting only first_name or last_name, or both, or none. So your code will have to dynamically build 4 versions of the same query. Imagine what would happen if you have 10 different criteria to consider, then how many combinations of the same query will you have to manage then?
Compile Time Optimization
I also might add that adding in the 0=? as a bind variable switch will not work very well if all your criteria are indexed. The run time optimizer that will select appropriate indexes and execution plans, might just not see the cost benefit of using the index in those slightly more complex predicates. Hence I usally advice, to inject the 0 / 1 explicitly into your query (string concatenating it in in your sql, or doing some search/replace). Doing so will give the compiler the chance to optimize out redundant statements, and give the Runtime Executer a much simpler query to look at.
(0=1 OR cond = ?) --> (cond = ?)
(0=0 OR cond = ?) --> Always True (ignore predicate)
In the second statement above the compiler knows that it never has to even consider the second part of the condition (cond = ?), and it will simply remove the entire predicate. If it were a bind variable, the compiler could never have accomplished this.
Because you are simply, and forcedly, injecting a 0/1, there is zero chance of SQL injections.
In my SQL's, as one approach, I typically place my sql injection points as ${literal_name}, and I then simply search/replace using a regex any ${...} occurrence with the appropriate literal, before I even let the compiler have a stab at it. This basically leads to a query stored as follows:
WHERE 1=1
AND (0=${cond1_enabled} OR cond1 = ?)
AND (0=${cond2_enabled} OR cond2 = ?)
Looks good, easily understood, the compiler handles it well, and the Runtime Cost Based Optimizer understands it better and will have a higher likelihood of selecting the right index.
I take special care in what I inject. Prime way for passing variables is and remains bind variables for all the obvious reasons.
This is very good in metadata fetching and makes thing generic.
Many DBs have optimizer so they will not actually execute it but its still a valid SQL statement and should execute on all DBs.
This will not fetch any result, but you know column names are valid, data types etc. If it does not execute you know something is wrong with DB(not up etc.)
So many generic programs execute this dummy statement for testing and fetching metadata.
Some systems use scripts and can dynamically set selected records to be hidden from a full list; so a false condition needs to be passed to the SQL. For example, three records out of 500 may be marked as Privacy for medical reasons and should not be visible to everyone. A dynamic query will control the 500 records are visible to those in HR, while 497 are visible to managers. A value would be passed to the SQL clause that is conditionally set, i.e. ' WHERE 1=1 ' or ' WHERE 1=0 ', depending who is logged into the system.
quoted from Greg
If the list of conditions is not known at compile time and is instead
built at run time, you don't have to worry about whether you have one
or more than one condition. You can generate them all like:
and
and concatenate them all together. With the 1=1 at the start, the
initial and has something to associate with.
I've never seen this used for any kind of injection protection, as you
say it doesn't seem like it would help much. I have seen it used as an
implementation convenience. The SQL query engine will end up ignoring
the 1=1 so it should have no performance impact.
Why would someone use WHERE 1=1 AND <conditions> in a SQL clause?
If the user intends to only append records, then the fastest method is open the recordset without returning any existing records.
It can be useful when only table metadata is desired in an application. For example, if you are writing a JDBC application and want to get the column display size of columns in the table.
Pasting a code snippet here
String query = "SELECT * from <Table_name> where 1=0";
PreparedStatement stmt = connection.prepareStatement(query);
ResultSet rs = stmt.executeQuery();
ResultSetMetaData rsMD = rs.getMetaData();
int columnCount = rsMD.getColumnCount();
for(int i=0;i<columnCount;i++) {
System.out.println("Column display size is: " + rsMD.getColumnDisplaySize(i+1));
}
Here having a query like "select * from table" can cause performance issues if you are dealing with huge data because it will try to fetch all the records from the table. Instead if you provide a query like "select * from table where 1=0" then it will fetch only table metadata and not the records so it will be efficient.
Per user milso in another thread, another purpose for "WHERE 1=0":
CREATE TABLE New_table_name as select * FROM Old_table_name WHERE 1 =
2;
this will create a new table with same schema as old table. (Very
handy if you want to load some data for compares)
An example of using a where condition of 1=0 is found in the Northwind 2007 database. On the main page the New Customer Order and New Purchase Order command buttons use embedded macros with the Where Condition set to 1=0. This opens the form with a filter that forces the sub-form to display only records related to the parent form. This can be verified by opening either of those forms from the tree without using the macro. When opened this way all records are displayed by the sub-form.
In ActiveRecord ORM, part of RubyOnRails:
Post.where(category_id: []).to_sql
# => SELECT * FROM posts WHERE 1=0
This is presumably because the following is invalid (at least in Postgres):
select id FROM bookings WHERE office_id IN ()
It seems like, that someone is trying to hack your database. It looks like someone tried mysql injection. You can read more about it here: Mysql Injection

Large number of UPDATE queries slowing down page

I am reading and validating large fixed-width text files (range from 10-50K lines) that are submitted via our ASP.net website (coded in VB.Net). I do an initial scan of the file to check for basic issues (line length, etc). Then I import each row into a MS SQL table. Each DB rows basically consists of a record_ID (Primary, auto-incrementing) and about 50 varchar fields.
After the insert is done, I run a validation function on the file that checks each field in each row based on a bunch of criteria (trimmed length, isnumeric, range checks, etc). If it finds an error in any field, it inserts a record into the Errors table, which has an error_ID, the record_ID and an error message. In addition, if the field fails in a particular way, I have to do a "reset" on that field. A reset might consist of blanking the entire field, or simply replacing the value with another value (e.g. replacing the string with a new one that has all illegals chars taken out).
I have a 5,000 line test file. The upload, initial check, and import takes about 5-6 seconds. The detailed error check and insert into the Errors table takes about 5-8 seconds (this file has about 1200 errors in it). However, the "resets" part takes about 40-45 seconds for 750 fields that need to be reset. When I comment out the resets function (returning immediately without actually calling the UPDATE stored proc), the process is very fast. With the resets turned on, the pages take 50 seconds to return.
My UPDATE stored proc is using some recommended code from http://sommarskog.se/dynamic_sql.html, whereby it uses CASE instead of dynamic SQL:
UPDATE dbo.Records
SET dbo.Records.file_ID = CASE #field_name WHEN 'file_ID' THEN #field_value ELSE file_ID END,
.
. (all 50 varchar field CASE statements here)
.
WHERE dbo.Records.record_ID = #record_ID
Is there any way I can help my performance here. Can I somehow group all of these UPDATE calls into a single transaction? Should I be reworking the UPDATE query somehow? Or is it just sheer quantity of 750+ UPDATEs and things are just slow (it's a quad proc server with 8GB ram).
Any suggestions appreciated.
Don't do this in sql; fix the data up in code, then do you updates.
If you have sql 2008, then look into table-value parameters. It enables you to pass an entire table as a parameter to a s'proc. From their you just have the one insert/update or merge statement
If your looping through the lines and doing individual updates/inserts this can be really expensive... Consider using SqlBulkCopy which can speed up all your inserts. Similarly, you can create a DataSet, make your updates on the dataset and then submit them all in one shot through a SqlDataAdapter.
I believe you are doing 50 case statements on every update. Sounds like that would be slow.
It is possible to solve this problem with inject proof code via parameterized querys and a string constant table.
Quick and dirty example code.
string [] queryList = { "UPDATE records SET col1 = {val} WHERE ID={key}",
"UPDATE records SET col2 = {val} WHERE ID={key}",
"UPDATE records SET col3 = {val} WHERE ID={key}",
...
"UPDATE records SET col50 = {val} WHERE ID={key}"}
Then in your call to SQL you just pick the item in the array corresponding to the col you want to update and set the value and key for the parameterized items.
I'm guessing you will see a significant improvement... let me know how it goes.
Um. Why are you inserting numeric data into VARCHAR fields then trying to run numeric checks on it? This is yucky.
Apply correct data typing and constraints to your table, do the INSERT, and see if it failed. SQL Server will happily report errors back to you.
I would try changing the recovery model to simple and look at my indexes. Kimberly Tripp did a session showing a scenario with improved performance using a heap.

How bad is my query?

Ok I need to build a query based on some user input to filter the results.
The query basically goes something like this:
SELECT * FROM my_table ORDER BY ordering_fld;
There are four text boxes in which users can choose to filter the data, meaning I'd have to dynamically build a "WHERE" clause into it for the first filter used and then "AND" clauses for each subsequent filter entered.
Because I'm too lazy to do this, I've just made every filter an "AND" clause and put a "WHERE 1" clause in the query by default.
So now I have:
SELECT * FROM my_table WHERE 1 {AND filters} ORDER BY ordering_fld;
So my question is, have I done something that will adversely affect the performance of my query or buggered anything else up in any way I should be remotely worried about?
MySQL will optimize your 1 away.
I just ran this query on my test database:
EXPLAIN EXTENDED
SELECT *
FROM t_source
WHERE 1 AND id < 100
and it gave me the following description:
select `test`.`t_source`.`id` AS `id`,`test`.`t_source`.`value` AS `value`,`test`.`t_source`.`val` AS `val`,`test`.`t_source`.`nid` AS `nid` from `test`.`t_source` where (`test`.`t_source`.`id` < 100)
As you can see, no 1 at all.
The documentation on WHERE clause optimization in MySQL mentions this:
Constant folding:
(a<b AND b=c) AND a=5
-> b>5 AND b=c AND a=5
Constant condition removal (needed because of constant folding):
(B>=5 AND B=5) OR (B=6 AND 5=5) OR (B=7 AND 5=6)
-> B=5 OR B=6
Note 5 = 5 and 5 = 6 parts in the example above.
You can EXPLAIN your query:
http://dev.mysql.com/doc/refman/5.0/en/explain.html
and see if it does anything differently, which I doubt. I would use 1=1, just so it is more clear.
You might want to add LIMIT 1000 or something, when no parameters are used and the table gets large, will you really want to return everything?
WHERE 1 is a constant, deterministic expression which will be "optimized out" by any decent DB engine.
If there is a good way in your chosen language to avoid building SQL yourself, use that instead. I like Python and Django, and the Django ORM makes it very easy to filter results based on user input.
If you are committed to building the SQL yourself, be sure to sanitize user inputs against SQL injection, and try to encapsulate SQL building in a separate module from your filter logic.
Also, query performance should not be your concern until it becomes a problem, which it probably won't until you have thousands or millions of rows. And when it does come time to optimize, adding a few indexes on columns used for WHERE and JOIN goes a long way.
TO improve performance, use column indexes on fields listen in "WHERE"
Standard SQL Injection Disclaimers here...
One thing you could do, to avoid SQL injection since you know it's only four parameters is use a stored procedure where you pass values for the fields or NULL. I am not sure of mySQL stored proc syntax, but the query would boil down to
SELECT *
FROM my_table
WHERE Field1 = ISNULL(#Field1, Field1)
AND Field2 = ISNULL(#Field2, Field2)
...
ORDRE BY ordering_fld
We've been doing something similiar not too long ago and there're a few things that we observed:
Setting up the indexes on the columns we were (possibly) filtering, improved performance
The WHERE 1 part can be left out completely if the filters're not used. (not sure if it applies to your case) Doesn't make a difference, but 'feels' right.
SQL injection shouldn't be forgotten
Also, if you 'only' have 4 filters, you could build up a stored procedure and pass in null values and check for them. (just like n8wrl suggested in the meantime)
That will work - some considerations:
About dynamically built SQL in general, some databases (Oracle at least) will cache execution plans for queries, so if you end up running the same query many times it won't have to completely start over from scratch. If you use dynamically built SQL, you are creating a different query each time so to the database it will look like 100 different queries instead of 100 runs of the same query.
You'd probably just need to measure the performance to find out if it works well enough for you.
Do you need all the columns? Explicitly specifying them is probably better than using * anyways because:
You can visually see what columns are being returned
If you add or remove columns to the table later, they won't change your interface
Not bad, i didn't know this snippet to get rid of the 'is it the first filter 3' question.
Tho you should be ashamed of your code ( ^^ ), it doesn't do anything to performance as any DB Engine will optimize it.
The only reason I've used WHERE 1 = 1 is for dynamic SQL; it's a hack to make appending WHERE clauses easier by using AND .... It is not something I would include in my SQL otherwise - it does nothing to affect the query overall because it always evaluates as being true and does not hit the table(s) involved so there aren't any index lookups or table scans based on it.
I can't speak to how MySQL handles optional criteria, but I know that using the following:
WHERE (#param IS NULL OR t.column = #param)
...is the typical way of handling optional parameters. COALESCE and ISNULL are not ideal because the query is still utilizing indexes (or worse, table scans) based on a sentinel value. The example I provided won't hit the table unless a value has been provided.
That said, my experience with Oracle (9i, 10g) has shown that it doesn't handle [ WHERE (#param IS NULL OR t.column = #param) ] very well. I saw a huge performance gain by converting the SQL to be dynamic, and used CONTEXT variables to determine what to add. My impression of SQL Server 2005 is that these are handled better.
I have usually done something like this:
for(int i=0; i<numConditions; i++) {
sql += (i == 0 ? "WHERE " : "AND ");
sql += dbFieldNames[i] + " = " + safeVariableValues[i];
}
Makes the generated query a little cleaner.
One alternative i sometimes use is to build the where clause an an array and then join them together:
my #wherefields;
foreach $c (#conditionfields) {
push #wherefields, "$c = ?",
}
my $sql = "select * from table";
if(#wherefields) { $sql.=" WHERE " . join (" AND ", #wherefields); }
The above is written in perl, but most languages have some kind of join funciton.

Adding update SQL queries

I have a script that updates itself every week. I've got a warning from my hosting that I've been overloading the server with the script. The problem, I've gathered is that I use too many UPDATE queries (one for each of my 8000+ users).
It's bad coding, I know. So now I need to lump all the data into one SQL query and update it all at once. I hope that is what will fix my problem.
A quick question. If I add purely add UPDATE queries separated by a semicolon like this:
UPDATE table SET something=3 WHERE id=8; UPDATE table SET something=6 WHERE id=9;
And then update the database with one large SQL code as opposed to querying the database for each update, it will be faster right?
Is this the best way to "bunch" together UPDATE statements? Would this significantly reduce server load?
Make a delimited file with your values and use your equivalent of MySQL's LOAD DATA INFILE. This will be significantly faster than an UPDATE.
LOAD DATA INFILE '/path/to/myfile'
REPLACE INTO TABLE thetable(field1,field2, field3)
//optional field and line delimiters
;
Your best bet is to batch these statements by your "something" field:
UPDATE table SET something=3 WHERE id IN (2,4,6,8)
UPDATE table SET something=4 WHERE id IN (1,3,5,7)
Of course, knowing nothing about your requirements, there is likely a better solution out there...
It will improve IO since there is only one round trip, but the database "effort" will be the same.
A curiosity of SQL is that the following integer expression
(1 -abs(sign(A - B))) = 1 if A == B and 0 otherwise. For convenience lets call this expression _eq(A,B).
So
update table set something = 3*_eq(id,8) + 6* _eq(id,9)
where id in (8,9);
will do what you want with a single update statement.