Search and replace part of string in database - what are the pitfalls? - sql

I need to replace all occurrences "google.com" that are met in the SQL db table Column1 with "newurl". It can be a full cell value, a part of it (substring of varchar()), can be met even several times in a cell.
Based on SO answer search-and-replace-part-of-string-in-database
this is what I need:
UPDATE
MyTable
SET
Column1 = Replace(Column, 'google.com', 'newurl')
WHERE
xxx
However, in that answer it is mentioned that
You will want to be extremely careful when doing this! I highly recommend doing a backup first.
What are the pitfalls of doing this query? Looks like it does the same what any texteditor would do by clicking on Replace All button. I don't think it is possible in my case to check the errors even with reserve copy as I would like to know possible errors in advance.
Any reasons to be careful with this query?
Again, I expect it replaces all occurences of google.com with 'newurl' in the Column1 of MyTable table in the SQL db.
Thank you.

Just create a test table, as a replica of your original source table, complete the update on there and check results.
You would want to do this as good SQL programming practice to ensure you don't mess up columns that should not be updated.
Another thing you can do is get a count of the records before hand that fit the criteria using a SELECT statement.
Run your update statement and if it's a 1-1 match on count, you should be good to go.
The only thing i can think of that would happen negatively in this respect is that additional columns get updated. Your WHERE clause is not specific for us to see, so there's no way to validate that what you're doing will do what you expect it to.

I think the person posting the answer is just being cautious - This will modify the value in Column1 for every row in MyTable, so make sure you mean it when you execute. Another way to be cautious would be to wrap it in a transaction so you could roll it back if you don't like the results.

Related

Deleting in SQL using multiple conditions

Some background, I have a code column that is char(6). In this field, I have the values of 0,00,000,0000,000000,000000. It seems illogical but that's how it is. What i need to do is delete all rows that possess these code values. I know how to do it individually as such
delete from [dbo.table] where code='0'
delete from [dbo.table] where code='00'
and so on.
How does one do this one section of code instead of 6
Try this:
delete from [dbo.table] where code='0'
or code='00'
or code='000'
etc. You get the idea.
There can be more efficient ways when the set of vales gets larger, but your 5 or 6 values is still quite a ways from that.
Update:
If your list grows long, or if your table is significantly larger than can reside in cache, you will likely see a significant performance gain by storing your selection values into an indexed temporary table and joining to it.
It strongly depends on your DBMS, but I suggest to use regular expressions. For example, with MySQL you just need simple query like this:
delete from dbo.table where code regexp '(0+)'
For most of popular DBMS you can do the same, but syntax may be various
I can't test it right now, but the following should work:
DELETE FROM dbo.table WHERE CONVERT(int, code) = 0
edit- Just thought of another way, that should be safer:
DELETE FROM dbo.table WHERE LEN(code) > 0 AND LEFT(code + '0000000000', 10) = '0000000000'

Query to Find Adjacent Date Records

There exists in my database a page_history table; the idea is that whenever a record in the page table is changed, that record's old values are stored in the history table.
My job now is to find occasions in which a record was changed, and retrieve the pre- and post-conditions of that change. Specifically, I want to know when a page changed groups, and what groups were involved in the change. The query I have below can find these instances, but with the use of the min function, I can only get back the values that match between the two records:
select page_id,
original_group,
min(created2) change_date
from (select h.page_id,
h.group_id original_group,
i.group_id new_group,
h.created_dttm created1,
i.created_dttm created2
from page_history h,
page_history i
where h.page_id = i.page_id
and h.created_dttm < i.created_dttm
and h.group_id != i.group_id)
group by page_id, original_group, created1
order by page_id
When I try to get, say, any details of the second record, like new_group, I'm hit with a ORA-00979: not a GROUP BY expression error. I don't want to group by new_group, though, because that's going to destroy the logic (I think it would find records displaying times a page changed from a group to another group, regardless of any changes to other groups in between).
My question, then, is how can I modify this query, or go about writing a new one, that achieves a similar end, but with the added availability of columns that do not match between the two records? In essence, how can I find that min record without sacrificing all the other columns I'm not trying to compare? I don't exactly need a complete answer, any suggestions that point me in the right direction would be appreciated.
I use PL/SQL Developer, and it looks like version 11.2.0.2.0 of Oracle.
EDIT: I have found a solution. It's not pretty, and I'd still like to see some alternatives, but if helping me out would threaten to explode your brain, I would advise relocating to an easier question.
Without seeing your table structure it's hard to re-write the query but when you have a min function used like that it invariably seems better to put it into a separate sub select to get what you want and then compare the result of that.

How you get a list of updated columns in SQL server trigger?

I want to know what columns where updated during update operation on a triger on first scaaning books online it looks like COLUMNS_UPDATED is the perfect solution but this function actualy don't check if values has changed , it check only what columns where selected in update clause, any one has other suggestions ?
The only way you can check if the values have changed is to compare the values in the DELETED and INSERTED virtual tables within the trigger. SQL doesn't check the existing value before updating to the new one, it will happily write a new identical value over the top - in other words, it takes your word for the update and tracks the update rather than actual changes.
We can use Update function to find if a particular column is updated:
IF UPDATE(ColumnName)
Refer to this link for details: http://msdn.microsoft.com/en-us/library/ms187326.aspx
As the others have posted, you'll need to interrogate INSERTED and DELETED. The only other useful bit of advice might be that you can get only the rows that have changed values (and discard the rows that didn't change) by using the EXCEPT operator - like this:
SELECT * FROM Inserted
EXCEPT
SELECT * FROM Deleted
The only way I can think of is that you can compare the values in DELETED and INSERTED to see which columns have changed.
Doesn't seem a particularly elegant solution though.
I asked this same question!
The previous posters are correct -- without directly comparing the values, you can't tell for sure whether the data has actually changed or not. However, there are several ways to do this type of checking, depending on what else you're trying to do in the trigger. My question has some good advice in the answers about those different mechanisms and their tradeoffs.

Adding update SQL queries

I have a script that updates itself every week. I've got a warning from my hosting that I've been overloading the server with the script. The problem, I've gathered is that I use too many UPDATE queries (one for each of my 8000+ users).
It's bad coding, I know. So now I need to lump all the data into one SQL query and update it all at once. I hope that is what will fix my problem.
A quick question. If I add purely add UPDATE queries separated by a semicolon like this:
UPDATE table SET something=3 WHERE id=8; UPDATE table SET something=6 WHERE id=9;
And then update the database with one large SQL code as opposed to querying the database for each update, it will be faster right?
Is this the best way to "bunch" together UPDATE statements? Would this significantly reduce server load?
Make a delimited file with your values and use your equivalent of MySQL's LOAD DATA INFILE. This will be significantly faster than an UPDATE.
LOAD DATA INFILE '/path/to/myfile'
REPLACE INTO TABLE thetable(field1,field2, field3)
//optional field and line delimiters
;
Your best bet is to batch these statements by your "something" field:
UPDATE table SET something=3 WHERE id IN (2,4,6,8)
UPDATE table SET something=4 WHERE id IN (1,3,5,7)
Of course, knowing nothing about your requirements, there is likely a better solution out there...
It will improve IO since there is only one round trip, but the database "effort" will be the same.
A curiosity of SQL is that the following integer expression
(1 -abs(sign(A - B))) = 1 if A == B and 0 otherwise. For convenience lets call this expression _eq(A,B).
So
update table set something = 3*_eq(id,8) + 6* _eq(id,9)
where id in (8,9);
will do what you want with a single update statement.

SQL Super Search

Does anyone have a good method for searching an entire database for a given value?
I have a specific string I'm looking for, it's in TableA, and it's also a FK to some other table, TableB, except I don't know which table/column that is.
Assuming there's a jillion tables and I don't want to look through them all, and maybe will have to do this in several different cases, what would be the best way?
Since I didn't want a Code-SQL bridge, my only all-SQL idea was:
select tablename and column_name from INFORMATION_SCHEMA.COLUMNS
...then use a cursor to flip through all the columns, and for all the datatypes of nvarchar I would execute dynamic SQL like:
SELECT * from #table where #column = #myvalue
Needless to say, this is slow AND a memory hog.
Anyone got any ideas?
Dump the database and grep?
I guess a more focused question might be: if you don't know how the schema works, what are you going to do with the answer you get anyway?
Here are a couple of links talking about how to do this:
http://blogs.lessthandot.com/index.php/DataMgmt/DataDesign/the-ten-most-asked-sql-server-questions--1#2
http://vyaskn.tripod.com/search_all_columns_in_all_tables.htm
Both of them use the approach you were hoping to avoid. Refine them so that they only searched columns that were foreign keys should improve their performance by eliminating the searching of unnecessary tables.
Here's a solution I wrote several years ago:
http://www.users.drew.edu/skass/sql/SearchAllTables.sql.txt
Just make SP that searches in all relevant columns using OR.
Why don't you know which columns to search on?
If the list of columns is ever-shifting, then you just need to make sure that whatever process results in changing the schema would result in the change in this stored procedure.
If the list of the columns is just too dang big for you to type up inot the SP, use some elementary perl/grep/whatnot to do it in 1 line, e.g for SYBASE.
my_dump_table_schema.pl|egrep "( CHAR| VARCHAR)"|awk '{$1}'|tr "\012" " "|perl -pe '{s/ / = \#SEARCH_VALUE OR /g}'; echo ' = #SEARCH_VALUE'
The last echo is needed to add the value to last column
to dump your data, read up on the bcp Utility