Replace all occurrences of a substring in a database text field - sql

I have a database that has around 10k records and some of them contain HTML characters which I would like to replace.
For example I can find all occurrences:
SELECT * FROM TABLE
WHERE TEXTFIELD LIKE '%&#47%'
the original string example:
this is the cool mega string that contains &#47
how to replace all &#47 with / ?
The end result should be:
this is the cool mega string that contains /

If you want to replace a specific string with another string or transformation of that string, you could use the "replace" function in postgresql. For instance, to replace all occurances of "cat" with "dog" in the column "myfield", you would do:
UPDATE tablename
SET myfield = replace(myfield,"cat", "dog")
You could add a WHERE clause or any other logic as you see fit.
Alternatively, if you are trying to convert HTML entities, ASCII characters, or between various encoding schemes, postgre has functions for that as well. Postgresql String Functions.

The answer given by #davesnitty will work, but you need to think very carefully about whether the text pattern you're replacing could appear embedded in a longer pattern you don't want to modify. Otherwise you'll find someone's nooking a fire, and that's just weird.
If possible, use a suitable dedicated tool for what you're un-escaping. Got URLEncoded text? use a url decoder. Got XML entities? Process them though an XSLT stylesheet in text mode output. etc. These are usually safer for your data than hacking it with find-and-replace, in that find and replace often has unfortunate side effects if not applied very carefully, as noted above.
It's possible you may want to use a regular expression. They are not a universal solution to all problems but are really handy for some jobs.
If you want to unconditionally replace all instances of "&#47" with "/", you don't need a regexp.
If you want to replace "&#47" but not "&#471", you might need a regexp, because you can do things like match only whole words, match various patterns, specify min/max runs of digits, etc.
In the PostgreSQL string functions and operators documentation you'll find the regexp_replace function, which will let you apply a regexp during an UPDATE statement.
To be able to say much more I'd need to know what your real data is and what you're really trying to do.

If you don't have postgres, you can export all database to a sql file, replace your string with a text editor and delete your db on your host, and re-import your new db
PS: be careful

Related

Does array like function exist in SQL

Using SQL I would like to know if its possible to do the following:
If I have a variable that the user inputs mutiple strings into seperated by a comma for example ('aa','bbb','c','dfd'), is it possible using LIKE with a wilcard at the end of each string in stead of having the user to enter each variations in multiple macros.
So say if user was looking for employee numbers that start with ('F','E','C') is it possible without using OR statements is the question I guess am asking?
It would be similar to that of an array I guess
No, LIKE is its own operator and therefore needs separated by an OR.
You might prefer ILIKE to LIKE, as it is a case-insensitive comparison.
You can also try to use REGEXP_LIKE, which is similar to what you want, except you'll have to use regex expressions instead of 'FEC%'
That depends on your SQL dialect; I don't know Impala at all, but other SQL engines have support for regular expressions in string matches, so that you can build a query string like
SELECT fld FROM tbl WHERE fld REGEXP '^[FEC].*$';
No matter what you do, you will need to build a query from your user's input. Passing through user input unprocessed into your SQL processor is a big "nope" anyways, from a "don't accidentally delete a table" point of view:

Searching for multuple valeus using REGEX

I have a big sporadic sql scripts and need to find and replace a few values in it. I am trying to pass my values in REGEX to Notepad++ but I can't seem to make it work. To be more specific, I have around 50 script, each with 5000 lines, and I need to look for a list of values, e.g. "[dbo].[livesales]" "[dbo].[CreditCards]" in all my scripts separately. I undertand that I need either run this separately against each script or merger them all into one file, but I need the proper REGEX command for it. I need to include square bracket and dots as well. I end up building this but it doesn't work for me:
^(?=.*\b[dbo].[LiveSales]\b)(?=.*\b[dbo].[CreditCards]\b).+$
enter image description here
thanks in advance,
I wouldn't bother using word boundaries, as square brackets in SQL Server are pretty ubiquitous for database object names (e.g. database and column names). I suggest the following pattern:
\[dbo\]\.\[(?:LiveSales|CreditCards)\]
Demo
The major changes I have made include not using word boundaries, escaping the [ and ] brackets (since square bracket is a regex metacharacter with a special meaning), and also not try to match the entire input. Presumably you want to find all such occurrences, and so don't bother trying to scope your pattern with ^ and $.

SQL Remove Substring From Query Results

I have a query that is returning data from a database. In a single field there is a rather long text comment with a segment, which is clearly defined with marking tags like !markerstart! and !markerend!. I would like to have a query return with the string segment between the two markers removed (and the markers removed too).
I would normally do this client-side after I get the data back, however, the problem is that the query is an INSERT query that gets it's data from a SELECT statement. I don't want the text segment to be stored in the archival/reporting table (working with an OLTP application here), so I need to find a way to get the SELECT statement to return exactly what is to be inserted, which, in this case, means getting the SELECT statement to strip out the unwanted phrase instead of doing it in post-processing client-side.
My only thought is to use some convoluted combination of SUBSTRING, CHARINDEX, and CONCAT, but I'm hoping there is a better way, but, based on this, I don't see how. Anyone have ideas?
Sample:
This is a long string of text in some field in a database that has a segment that needs to be removed. !markerstart! This is the segment that is to be removed. It's length is unknown and variable. !markerend! The part of this field that appears after the marker should remain.
Result:
This is a long string of text in some field in a database that has a segment that needs to be removed. The part of this field that appears after the marker should remain.
SOLUTION USING STUFF:
I really don't like how verbose this is, but I can put it in a function if I really need to. It isn't ideal, but it is easier and faster than a CLR routine.
SELECT STUFF(CAST(Description AS varchar(MAX)), CHARINDEX('!markerstart!', Description), CHARINDEX('!markerend!', Description) + 11 - CHARINDEX('!markerstart!', Description), '') AS Description
FROM MyTable
You may want to consider implementing a CLR user-defined function that returns the parsed data.
The following link demonstrates how to use a CLR UDF RegEx function for pattern matching and data extraction.
http://msdn.microsoft.com/en-us/magazine/cc163473.aspx
Regards,
You can use Stuff function or Replace function and replace your unwanted symbols with ''.
STUFF('EXP',START_POS,'NUMBER_OF_CHARS','REPLACE_EXP')

SQLite function that works like the Oracle's "Translate" function?

Oracle has a function called translate that can be used to replace individual characters of the string by others, in the same order that they appear. It is different than the replace function, which replaces the entire second argument occurence by the entire third argument.
translate('1tech23', '123', '456'); --would return '4tech56'
translate('222tech', '2ec', '3it'); --would return '333tith'
I need this to implement a search on a SQLite database ignoring accents (brazilian portuguese language) on my query string. The data in the table that will be queried could be with or without accents, so, depending on how the user type the query string, the results would be different.
Example:
Searching for "maçã", the user could type "maca", "maça", "macã" or "maçã", and the data in the table could also be in one of the four possibilities.
Using oracle, I would only use this:
Select Name, Id
From Fruits
Where Translate(Name, 'ãç','ac') = Translate(:QueryString, 'ãç','ac')
... and these other character substitutions:
áéíóúÁÉÍÓÚàèìòùÀÈÌÒÙãõÃÕäëïöüÄËÏÖÜâêîôûÂÊÎÔÛñÑçÇ
by:
aeiouAEIOUaeiouAEIOUaoAOaeiouAEIOUaeiouAEIOUnNcC
Of course I could nest several calls to Replace, but this wouldn't be a good choice.
Thanks in advance by some help.
Open-source Oracle functions for SQLite have been written at Kansas State University. They include translate() (full UTF-8 support, by the way) and can be found here.
I don't believe there is anything in sqlite that will translate text in a single pass as you describe.
This wouldn't be difficult to implement as a user defined function however. Here is a decent starting reference.
I used replace
REPLACE(string,pattern,replacement)
https://www.sqlitetutorial.net/sqlite-replace-function/

Return sql rows where field contains ONLY non-alphanumeric characters

I need to find out how many rows in a particular field in my sql server table, contain ONLY non-alphanumeric characters.
I'm thinking it's a regular expression that I need along the lines of [^a-zA-Z0-9] but Im not sure of the exact syntax I need to return the rows if there are no valid alphanumeric chars in there.
SQL Server doesn't have regular expressions. It uses the LIKE pattern matching syntax which isn't the same.
As it happens, you are close. Just need leading+trailing wildcards and move the NOT
WHERE whatever NOT LIKE '%[a-z0-9]%'
If you have short strings you should be able to create a few LIKE patterns ('[^a-zA-Z0-9]', '[^a-zA-Z0-9][^a-zA-Z0-9]', ...) to match strings of different length. Otherwise you should use CLR user defined function and a proper regular expression - Regular Expressions Make Pattern Matching And Data Extraction Easier.
This will not work correctly, e.g. abcÑxyz will pass thru this as it has a,b,c... you need to work with Collate or check each byte.