Efficiency check in sql query - sql

I'm struggle between 2 ways of execute my mission.
I have set of conditions, and when they are all true - I need to set "true" in an "x" column attribute of that record.
What is more efficient & recommended, and why?
set this column into "false" for all records in the table, and then
run another query to set "true" under some conditions.
set "true" if all conditions are "true", then run another query to set "false" on all records where one or more of the conditions fails.
I cannot assume that there is some default value of the "x" column need to be changed, because the query should run also one in a while, when its needed, after initial values were inserted to that column, and some conditions may be changed from the last time I ran this query.
Perhaps there is also another idea, more efficient that the 2 above?
Also I'd like to understand how to calculate efficiently of a query, something similar to the way of efficient calculation in programming.

Probably the most efficient way is to use a case Statement, since you have to evaluate the condition only once per row and also have to modify every row only once, e.g.
UPDATE tablename SET fieldname = (CASE WHEN conditions THEN 'true' ELSE 'false' END)
Case Statements are at least available in Oracle and MS-SQL. Dont know about other DB vendors.

if all conditions are similar in datastructure, I don't think it really matters.
If the conditions have different data query structures (for example, "my key isn't a foreign key in tableB" and "State = 'California'"), setting to true and then checking them one by one (as a sequence of SQL statements) is better, cause you can check the simple ones first and not bother about the complex ones for records that got their flag removed already.

Related

SQL-ish : how to change enormous code into an elegant one?

I have just one abhorrend table, no index, no keys, no IDs, no order, 25 columns, 19 million rows.
I am using the SQL-ish language named TaQL ("Table Query Language").
I need to select-from-where ... It sounds no problem!
However the WHERE conditions are 1683 sets of simple conditions:
set#1: columnA>num1 and columnB>num2 and columnC<num3 and columnD>=num4 ...
or
set#2: columnA>num189 and columnB>num274 and columnC<num321 and columnD>=num457 ...
or
set#n: ...
or
set#1683: ....
My current code is working fine, but it has 1683 lines in the WHERE statement. I created it by awk and regular expressions.
Is there an elegant way to reduce such enormous code?
Have you ever tried the approach of creating a new data format the DID have some normalization to it so you could import into your own database, then you can add indexes, clean criteria keep, etc. Maybe even adding a new column to the end of your table (or columns) as "KeepThis".
Then, apply an update YourTable set KeepThis = 1 where your criteria.. maybe even set the value equal to the condition it qualified with. Then you could query based on those values, or even all those NOT assigned to a value and see if any merit with those records not previously realized.
Sounds like a task no matter what, but might be a nice approach to have things pre-stamped and in a database you can manage vs other source.

Search and replace part of string in database - what are the pitfalls?

I need to replace all occurrences "google.com" that are met in the SQL db table Column1 with "newurl". It can be a full cell value, a part of it (substring of varchar()), can be met even several times in a cell.
Based on SO answer search-and-replace-part-of-string-in-database
this is what I need:
UPDATE
MyTable
SET
Column1 = Replace(Column, 'google.com', 'newurl')
WHERE
xxx
However, in that answer it is mentioned that
You will want to be extremely careful when doing this! I highly recommend doing a backup first.
What are the pitfalls of doing this query? Looks like it does the same what any texteditor would do by clicking on Replace All button. I don't think it is possible in my case to check the errors even with reserve copy as I would like to know possible errors in advance.
Any reasons to be careful with this query?
Again, I expect it replaces all occurences of google.com with 'newurl' in the Column1 of MyTable table in the SQL db.
Thank you.
Just create a test table, as a replica of your original source table, complete the update on there and check results.
You would want to do this as good SQL programming practice to ensure you don't mess up columns that should not be updated.
Another thing you can do is get a count of the records before hand that fit the criteria using a SELECT statement.
Run your update statement and if it's a 1-1 match on count, you should be good to go.
The only thing i can think of that would happen negatively in this respect is that additional columns get updated. Your WHERE clause is not specific for us to see, so there's no way to validate that what you're doing will do what you expect it to.
I think the person posting the answer is just being cautious - This will modify the value in Column1 for every row in MyTable, so make sure you mean it when you execute. Another way to be cautious would be to wrap it in a transaction so you could roll it back if you don't like the results.

Confused on what to use, If Then or Case Else Statement

I am trying to determine if I should use a CASE Statement or an IF THEN statement to get my results.
I want a SQL statement to run when a certain condition exists, but am not certain on how to check for the condition. Here is what I am working on
IF EXISTS(SELECT source FROM map WHERE rev_num =(SELECT MAX(rev_num) from MAP <-- at this point it would return either an A or B -->
What ever the answer is i then need to run a set of SQL's. So for A it would do this set of statements and for B it would do another.
CASE is used within a SQL statement. IF/THEN can be used to choose which query to execute.
Based on your somewhat vague example it seems like you want to execute different queries based on some condition. In that case, an IF/THEN seems more appropriate.
If, however, the majority of each query is identical and you're just changing part of the query then you may be able to use CASE to reduce the amount of duplicate code.

SQLServer - MERGE with condition is slow, even an "always false" condition

I want to execute the MERGE statement conditionally, so it won't try to match the entire target-table.
My original statement was kinda like this:
MERGE [target_table] USING [table_source]
ON (([target_table].[ID] = [table_source].[ID]) AND (condition))
WHEN MATCHED THEN UPDATE
SET [table_source].[_strField1] = [table_source].[_strField2];
Note: assume '_strField' to be typed as nvarchar(4000), and 'condition' to be something like [target_table].[_strField8] = 'sometext'.
But then I've encountered the following warning in the documentation that dictates "...Do not attempt to improve query performance by filtering out rows in the target table in the ON clause".
So my original query was altered to the following one
MERGE [target_table] USING [table_source]
ON (([target_table].[ID] = [table_source].[ID]))
WHEN MATCHED AND (condition)
THEN UPDATE
SET [table_source].[_strField] = [table_source].[_strField];
The problem is, that the query now takes a lot more time. Even changing the condition to be "always false", such as 1 = 2 doesn't help at all. On the other hand, setting different fields, such as
SET [table_source].[_intField] = [table_source].[_intField];
or any other types other than two nvarchar(4000)s causes the statement to be executed much faster.
To conclude, the things I don't understand are:
If the data-setting of nvarchar(4000) is the longer process, why setting the condition to be "1 = 2" doesn't speed up the execution time?
If the "row-matching" is the longer process, why setting INT fields does speed up the execution time?
According to sql server documentation:
"...Do not attempt to improve query performance by filtering out rows in the target table in the ON clause, such as by specifying AND NOT target_table.column_x = value. Doing so may return unexpected and incorrect results..*
your first query is invalid. You should not take time results into consideration.
If you update column with fixed lenght (int,date) there will be no "Row-Overflow Data Exceeding 8 KB" situation. This situation can occurs when you use nvarchar(4000) and give poor query performance.
We do not now how the merge function works inside and how it process data. So propably only developers of this function can give You answers to your question.
I hope that I have help you with varchar(4000) performance issue.
Marcin Pazgier

Same query produces different results in SQL

In one machine, we have a set of data and say we have a column isValid which contains true or false and we also have another column which defines a group.For every group there can be only one value as true for Isvalid column and the rest as false.
Now, when we run our query , based on Group, the row which contains the Isvalid column as True comes as the first row in query result and the rest of the rows contains the Isvalid column which are false.
Here we dont use any 'order by' or 'group by', we just use 'inner join' and 'where' conditions.
The issue is on our development server and test server we get the expected query results, but when it goes to live server(For all three servers, i.e development, testing and live server the data is entirely different and all these servers run on same version SQL 2005), the results are interchanged(Row with isvalid column false comes as first row in query result) don't know why.Any suggestions please?
Please help,
Many thanks,
Byfour
The obvious answer is: different data on test, dev and live servers.
But even with the same data, without an ORDER BY clause, results are usually returned in clustered index order, BUT this is not guaranteed.
If you require results in a specific ordering, then you must use an ORDER BY clause.