Conditional SQL replace - sql

Is it possible to conditionally replace parts of strings in MySQL?
Introduction to a problem: Users in my database stored articles (table called "table", column "value", each row = one article) with wrong links to images. I'd like to repair all of them at once. To do that, I have to replace all of the addresses in "href" links that are followed by images, i.e.,
<img src="link2">
should by replaced by
<img src="link2">
My idea is to search for each "href" tag and if the tag is followed by and "img", than I'd like to obtain "link2" from the image and use it replace "link1".
I know how to do it in bash or python but I do not have enough experience with MySQL.
To be specific, my table contains references to images like
<a href="www.a.cz/b/c"><img class="image image-thumbnail " src="www.d.cz/e/f.jpg" ...
I'd like to replace the first adress (href) by the image link. To get
<a href="www.d.cz/e/f.jpg"><img class="image image-thumbnail " src="www.d.cz/e/f.jpg" ...
Is it possible to make a query (queries?) like
UPDATE `table`
SET value = REPLACE(value, 'www.a.cz/b/c', 'XXX')
WHERE `value` LIKE '%www.a.cz/b/c%'
where XXX differs every time and its value is obtained from the database? Moreover, "www.a.cz/b/c" varies.
To make things complicated, not all of the images have the "href" link and not all of the links refer to images. There are three possibilities:
"href" followed by "img" -> replace
"href" not followed by "img" -> keep original link (probably a link to another page)
"img" without "href" -> do nothing (there is no wrong link to replace)
Of course, some of the images may have a correct link. In this case it may be also replaced (original and new will be the same).
Database info from phpMyAdmin
Software: MariaDB
Software version: 10.1.32-MariaDB - Source distribution
Protocol version: 10
Server charset: UTF-8 Unicode (utf8)
Apache
Database client version: libmysql - 5.6.15
PHP extension: mysqli
Thank you in advance

SELECT
regexp_replace(
value,
'^<a href="([^"]+)"><img class="([^"]+)" src="([^"]+)"(.*)$',
'<a href="\\3"><img class="\\2" src="\\3"\\4'
)
FROM
yourTable
The replacement only happens if the pattern is matched.
^ at the start means start of the string
([^"]+) means one of more characters, excluding "
(.*) means zero or more of any character
$ at the end means end of the string
The replacement takes the 3rd "pattern enclosed in braces" (back-reference) and puts it where the 1st "pattern enclosed in braces" (back-reference) was.
The 2nd, 3rd and 4th back-references are replaced with themselves (no change).
https://dbfiddle.uk/?rdbms=mariadb_10.2&fiddle=96aef2214f844a1466772f41415617e5
If you have strings that don't exactly match the pattern, it will do nothing. Extra spaces will trip it up, for example.
In which case you need to work out a new regular expression that always matches all of the strings you want to work on. Then you can use the \\n back-references to make replacements.
For example, the following deals with extra spaces in the href tag...
SELECT
regexp_replace(
value,
'^<a[ ]+href[ ]*=[ ]*"([^"]+)"><img class="([^"]+)" src="([^"]+)"(.*)$',
'<a href="\\3"><img class="\\2" src="\\3"\\4'
)
FROM
yourTable
EDIT:
Following comments clarifying that these are actually snippets from the MIDDLE of the string...
https://dbfiddle.uk/?rdbms=mariadb_10.2&fiddle=48ce1cc3df5bf4d3d140025b662072a7
UPDATE
yourTable
SET
value = REGEXP_REPLACE(
value,
'<a href="([^"]+)"><img class="([^"]+)" src="([^"]+)"',
'<a href="\\3"><img class="\\2" src="\\3"'
)
WHERE
value REGEXP '<a href="([^"]+)"><img class="([^"]+)" src="([^"]+)"'
(Though I prefer the syntax RLIKE, it's functionally identical.)
This will also find an replace that pattern multiple times. You're not clear if that's desired or possible.

Solved, thanks to #MatBailie , but I had to modified his answer. The final query, including the update, is
UPDATE `table`
SET value = REGEXP_REPLACE(value, '(.*)<a href="([^"]+)"><img class="([^"]+)" src="([^"]+)"(.*)', '\\1<a href="\\4"><img class="\\3" src="\\4"\\5'
)
A wildcard (.*) had to be put at the beginning of the search because the link is included in an article (long text) and, consequently, the arguments of the replace pattern are increased.

Related

TRIM or REPLACE in Netsuite Saved Search

I've looked at lots of examples for TRIM and REPLACE on the internet and for some reason I keep getting errors when I try.
I need to strip suffixes from my Netsuite item record names in a saved item search. There are three possible suffixes: -T, -D, -S. So I need to turn 24335-D into 24335, and 24335-S into 24335, and 24335-T into 24335.
Here's what I've tried and the errors I get:
Can you help me please? Note: I can't assume a specific character length of the starting string.
Use case: We already have a field on item records called Nickname with the suffixes stripped. But I've ran into cases where Nickname is incorrect compared to Name. Ex: Name is 24335-D but Nickname is 24331-D. I'm trying to build a saved search alert that tells me any time the Nickname does not equal suffix-stripped Name.
PS: is there anywhere I can pay for quick a la carte Netsuite saved search questions like this? I feel bad relying on free technical internet advice but I greatly appreciate any help you can give me!
You are including too much SQL - a formulae is like a single result field expression not a full statement so no FROM or AS. There is another place to set the result column/field name. One option here is Regex_replace().
REGEXP_REPLACE({name},'\-[TDS]$', '')
Regex meaning:
\- : a literal -
[TDS] : one of T D or S
$ : end of line/string
To compare fields a Formulae (Numeric) using a CASE statement can be useful as it makes it easy to compare the result to a number in a filter. A simple equal to 1 for example.
CASE WHEN {custitem_nickname} <> REGEXP_REPLACE({name},'\-[TDS]$', '') then 1 else 0 end
You are getting an error because TRIM can trim only one character : see oracle doc
https://docs.oracle.com/javadb/10.8.3.0/ref/rreftrimfunc.html (last example).
So try using something like this
TRIM(TRAILING '-' FROM TRIM(TRAILING 'D' FROM {entityid}))
And always keep in mind that saved searches are running as Oracle SQL queries so Oracle SQL documentation can help you understand how to use the available functions.

Trim a full string, not characters - Redshift

This is the same question as here, but the answers there were very specific to PHP (and I'm using Redshift SQL, not PHP).
I'm trying to remove specific suffixes from strings. I tried using RTRIM, but that will remove any of the listed characters, not just the full string. I only want the string changed if the exact suffix is there, and I only want it replaced once.
For example, RTRIM("name",' Inc') will convert "XYZ Company Corporation" into "XYZ Company Corporatio". (Removed final 'n' since that's part of 'Inc')
Next, I tried using a CASE statement to limit the incorrect replacements, but that still didn't fix the problem, since it will continue making replacements past the original suffix.
For example, when I run this:
CASE WHEN "name" LIKE '% Inc' THEN RTRIM("name",' Inc')
I get the following results:
"XYZ Association Inc" becomes "XYZ Associatio". (It trimmed Inc but also the final 'n')
I'm aware I can use the REPLACE function, but my understanding is that this will replace values from anywhere in the string, and I only want to replace when it exists at the end of the string.
How can I do this with Redshift? (I don't have the ability to use any other languages or tools here).
You can use REGEXP_REPLACE to remove the trailing Inc by using a regex that anchors the Inc to the end of the string:
CASE WHEN "name" LIKE '% Inc' THEN REGEXP_REPLACE("name", ' Inc$', '')

regex not working correctly when the test is fine

For my database, I have a list of company numbers where some of them start with two letters. I have created a regex which should eliminate these from a query and according to my tests, it should. But when executed, the result still contains the numbers with letters.
Here is my regex, which I've tested on https://www.regexpal.com
([^A-Z+|a-z+].*)
I've tested it against numerous variations such as SC08093, ZC000191 and NI232312 which shouldn't match and don't in the tests, which is fine.
My sql query looks like;
SELECT companyNumber FROM company_data
WHERE companyNumber ~ '([^A-Z+|a-z+].*)' order by companyNumber desc
To summerise, strings like SC08093 should not match as they start with letters.
I've read through the documentation for postgres but I couldn't seem to find anything regarding this. I'm not sure what I'm missing here. Thanks.
The ~ '([^A-Z+|a-z+].*)' does not work because this is a [^A-Z+|a-z+].* regex matching operation that returns true even upon a partial match (regex matching operation does not require full string match, and thus the pattern can match anywhere in the string). [^A-Z+|a-z+].* matches a letter from A to Z, +,|or a letter fromatoz`, and then any amount of any zero or more chars, anywhere inside a string.
You may use
WHERE companyNumber NOT SIMILAR TO '[A-Za-z]{2}%'
See the online demo
Here, NOT SIMILAR TO returns the inverse result of the SIMILAR TO operation. This SIMILAR TO operator accepts patterns that are almost regex patterns, but are also like regular wildcard patterns. NOT SIMILAR TO '[A-Za-z]{2}%' means all records that start with two ASCII letters ([A-Za-z]{2}) and having anything after (%) are NOT returned and all others will be returned. Note that SIMILAR TO requires a full string match, same as LIKE.
Your pattern: [^A-Z+|a-z+].* means "a string where at least some characters are not A-Z" - to extend that to the whole string you would need to use an anchored regex as shown by S-Man (the group defined with (..) isn't really necessary btw)
I would probably use a regex that specifies want the valid pattern is and then use !~ instead.
where company !~ '^[0-9].*$'
^[0-9].*$ means "only consists of numbers" and the !~ means "does not match"
or
where not (company ~ '^[0-9].*$')
Not start with a letter could be done with
WHERE company ~ '^[^A-Za-z].*'
demo: db<>fiddle
The first ^ marks the beginning. The [^A-Za-z] says "no letter" (including small and capital letters).
Edit: Changed [A-z] into the more precise [A-Za-z] (Why is this regex allowing a caret?)

Lucene 5.0.0 - search string with special characters

I am using Lucene version 5.0.0.
In my search string, there is a minus character like “test-”.
I read that the minus sign is a special character in Lucene. So I have to escape that sign, as in the queryparser documentation:
Escaping Special Characters:
Lucene supports escaping special characters that are part of the query syntax. The current list special characters are:
- + - && || ! ( ) { } [ ] ^ " ~ * ? : \ /`
To escape these character use the \ before the character. For example to search for (1+1):2 use the query:
\(1\+1\)\:2
To do that I use the QueryParser.escape method:
query = parser.parse(QueryParser.escape(searchString));
I use the classic Analyzer because I noticed that the standard Analyzer has some problems with escaping special characters.
The problem is that the Parser deletes the special characters and so the Query has the term
content:test
How can I set up the parser and searcher to search for the real value “test-“?
I also created my own query with the content test- but that also didn’t work. I recieved 0 results but my index has entries like:
Test-VRF
Test-IPLS
I am really confused about this problem.
While escaping special characters for the queryparser deals with part of the problem, it doesn't help with analysis.
Neither classic nor standard analyzer will keep punctuation in the indexed form of the field. For each of these examples, the indexed form will be in two terms:
test and vrf
test and ipls
This is why a manually constructed query for "test-" finds nothing. That term does not exist in the index.
The goal of these analyzers is to attempt to index words. As such, punctuation is mostly eliminated, and is not searchable. A phrase query for "test vrf" or "test-vrf" or "test_vrf" are all effectively identical. If that is not what you need, you'll need to look to other analyzers.
The goal to fix this issue is to store the value content in an NOT_ANALYZED way.
Field fieldType = new Field(key.toLowerCase(),value, Field.Store.YES, Field.Index.NOT_ANALYZED);
Someone who has the same problem has to take care how to store the contents in the index.
To request the result create a query in this way
searchString = QueryParser.escape(searchString);
and use for example a WhitespaceAnalyzer.

Postgresql database search with regex

I'm using PostgreSQL database with VB.NET and ODBC (Windows).
I'm searching sentences for whole words by combining SELECT with a regular expression, like this:
"SELECT dtbl_id, name
FROM mytable
WHERE name ~*'" + "( |^)" + TextBox1.Text + "([^A-z]|$)"
This searches well in some cases but because of syntax errors in text (or other reasons) it sometimes fails. For example, if I have the sentence
BILLY IDOL: WHITE WEDDING
the word "white" will be found. But if I have
CLASH-WHITE RIOT
then "white" will not be found, because there is no space between start of word "white".
The simplest solution would be to temporarily change or replace characters in the sentences :,.\/-= etc to spaces.
Is this possible to do in single SELECT line to be suitable for use with .NET/ODBC? Maybe inside the same regular expression?
If it is, how?
Try this:
SELECT 'CLASH-WHITE RIOT' ~ '[[:<:]]WHITE[[:>:]]';
[[:<:]] and [[:>:]] simply mean beginning and end of a word respectively
more info you can find at: http://www.postgresql.org/docs/9.1/static/functions-matching.html#FUNCTIONS-POSIX-REGEXP