Trim a full string, not characters - Redshift - sql

This is the same question as here, but the answers there were very specific to PHP (and I'm using Redshift SQL, not PHP).
I'm trying to remove specific suffixes from strings. I tried using RTRIM, but that will remove any of the listed characters, not just the full string. I only want the string changed if the exact suffix is there, and I only want it replaced once.
For example, RTRIM("name",' Inc') will convert "XYZ Company Corporation" into "XYZ Company Corporatio". (Removed final 'n' since that's part of 'Inc')
Next, I tried using a CASE statement to limit the incorrect replacements, but that still didn't fix the problem, since it will continue making replacements past the original suffix.
For example, when I run this:
CASE WHEN "name" LIKE '% Inc' THEN RTRIM("name",' Inc')
I get the following results:
"XYZ Association Inc" becomes "XYZ Associatio". (It trimmed Inc but also the final 'n')
I'm aware I can use the REPLACE function, but my understanding is that this will replace values from anywhere in the string, and I only want to replace when it exists at the end of the string.
How can I do this with Redshift? (I don't have the ability to use any other languages or tools here).

You can use REGEXP_REPLACE to remove the trailing Inc by using a regex that anchors the Inc to the end of the string:
CASE WHEN "name" LIKE '% Inc' THEN REGEXP_REPLACE("name", ' Inc$', '')

Related

TRIM or REPLACE in Netsuite Saved Search

I've looked at lots of examples for TRIM and REPLACE on the internet and for some reason I keep getting errors when I try.
I need to strip suffixes from my Netsuite item record names in a saved item search. There are three possible suffixes: -T, -D, -S. So I need to turn 24335-D into 24335, and 24335-S into 24335, and 24335-T into 24335.
Here's what I've tried and the errors I get:
Can you help me please? Note: I can't assume a specific character length of the starting string.
Use case: We already have a field on item records called Nickname with the suffixes stripped. But I've ran into cases where Nickname is incorrect compared to Name. Ex: Name is 24335-D but Nickname is 24331-D. I'm trying to build a saved search alert that tells me any time the Nickname does not equal suffix-stripped Name.
PS: is there anywhere I can pay for quick a la carte Netsuite saved search questions like this? I feel bad relying on free technical internet advice but I greatly appreciate any help you can give me!
You are including too much SQL - a formulae is like a single result field expression not a full statement so no FROM or AS. There is another place to set the result column/field name. One option here is Regex_replace().
REGEXP_REPLACE({name},'\-[TDS]$', '')
Regex meaning:
\- : a literal -
[TDS] : one of T D or S
$ : end of line/string
To compare fields a Formulae (Numeric) using a CASE statement can be useful as it makes it easy to compare the result to a number in a filter. A simple equal to 1 for example.
CASE WHEN {custitem_nickname} <> REGEXP_REPLACE({name},'\-[TDS]$', '') then 1 else 0 end
You are getting an error because TRIM can trim only one character : see oracle doc
https://docs.oracle.com/javadb/10.8.3.0/ref/rreftrimfunc.html (last example).
So try using something like this
TRIM(TRAILING '-' FROM TRIM(TRAILING 'D' FROM {entityid}))
And always keep in mind that saved searches are running as Oracle SQL queries so Oracle SQL documentation can help you understand how to use the available functions.

Conditional SQL replace

Is it possible to conditionally replace parts of strings in MySQL?
Introduction to a problem: Users in my database stored articles (table called "table", column "value", each row = one article) with wrong links to images. I'd like to repair all of them at once. To do that, I have to replace all of the addresses in "href" links that are followed by images, i.e.,
<img src="link2">
should by replaced by
<img src="link2">
My idea is to search for each "href" tag and if the tag is followed by and "img", than I'd like to obtain "link2" from the image and use it replace "link1".
I know how to do it in bash or python but I do not have enough experience with MySQL.
To be specific, my table contains references to images like
<a href="www.a.cz/b/c"><img class="image image-thumbnail " src="www.d.cz/e/f.jpg" ...
I'd like to replace the first adress (href) by the image link. To get
<a href="www.d.cz/e/f.jpg"><img class="image image-thumbnail " src="www.d.cz/e/f.jpg" ...
Is it possible to make a query (queries?) like
UPDATE `table`
SET value = REPLACE(value, 'www.a.cz/b/c', 'XXX')
WHERE `value` LIKE '%www.a.cz/b/c%'
where XXX differs every time and its value is obtained from the database? Moreover, "www.a.cz/b/c" varies.
To make things complicated, not all of the images have the "href" link and not all of the links refer to images. There are three possibilities:
"href" followed by "img" -> replace
"href" not followed by "img" -> keep original link (probably a link to another page)
"img" without "href" -> do nothing (there is no wrong link to replace)
Of course, some of the images may have a correct link. In this case it may be also replaced (original and new will be the same).
Database info from phpMyAdmin
Software: MariaDB
Software version: 10.1.32-MariaDB - Source distribution
Protocol version: 10
Server charset: UTF-8 Unicode (utf8)
Apache
Database client version: libmysql - 5.6.15
PHP extension: mysqli
Thank you in advance
SELECT
regexp_replace(
value,
'^<a href="([^"]+)"><img class="([^"]+)" src="([^"]+)"(.*)$',
'<a href="\\3"><img class="\\2" src="\\3"\\4'
)
FROM
yourTable
The replacement only happens if the pattern is matched.
^ at the start means start of the string
([^"]+) means one of more characters, excluding "
(.*) means zero or more of any character
$ at the end means end of the string
The replacement takes the 3rd "pattern enclosed in braces" (back-reference) and puts it where the 1st "pattern enclosed in braces" (back-reference) was.
The 2nd, 3rd and 4th back-references are replaced with themselves (no change).
https://dbfiddle.uk/?rdbms=mariadb_10.2&fiddle=96aef2214f844a1466772f41415617e5
If you have strings that don't exactly match the pattern, it will do nothing. Extra spaces will trip it up, for example.
In which case you need to work out a new regular expression that always matches all of the strings you want to work on. Then you can use the \\n back-references to make replacements.
For example, the following deals with extra spaces in the href tag...
SELECT
regexp_replace(
value,
'^<a[ ]+href[ ]*=[ ]*"([^"]+)"><img class="([^"]+)" src="([^"]+)"(.*)$',
'<a href="\\3"><img class="\\2" src="\\3"\\4'
)
FROM
yourTable
EDIT:
Following comments clarifying that these are actually snippets from the MIDDLE of the string...
https://dbfiddle.uk/?rdbms=mariadb_10.2&fiddle=48ce1cc3df5bf4d3d140025b662072a7
UPDATE
yourTable
SET
value = REGEXP_REPLACE(
value,
'<a href="([^"]+)"><img class="([^"]+)" src="([^"]+)"',
'<a href="\\3"><img class="\\2" src="\\3"'
)
WHERE
value REGEXP '<a href="([^"]+)"><img class="([^"]+)" src="([^"]+)"'
(Though I prefer the syntax RLIKE, it's functionally identical.)
This will also find an replace that pattern multiple times. You're not clear if that's desired or possible.
Solved, thanks to #MatBailie , but I had to modified his answer. The final query, including the update, is
UPDATE `table`
SET value = REGEXP_REPLACE(value, '(.*)<a href="([^"]+)"><img class="([^"]+)" src="([^"]+)"(.*)', '\\1<a href="\\4"><img class="\\3" src="\\4"\\5'
)
A wildcard (.*) had to be put at the beginning of the search because the link is included in an article (long text) and, consequently, the arguments of the replace pattern are increased.

Hyphenated terms in KDB WHERE clause list

I'm trying to list hyphenated criteria in a KDB WHERE IN list. The single (non-hyphenated) terms work just fine but when I need to have a hyphen in the literal, KDB doesn't like it. I've tried quoting the strings in a comma delimited list but that doesn't seem to work either.
This works just fine:
where product in (`CD`MUNICIPAL)
This gives me an error:
where product in (`TREASURY-NOTE`TREASURY-BOND`TREASURY-TIPS)
Error:
'TIPS
This is what I'm trying but with no luck:
where product in ("TREASURY-NOTE","TREASURY-BOND","TREASURY-TIPS")
Because "-" is a special character you need to declare these as strings before casting to symbols.
where product in `$("TREASURY-NOTE";"TREASURY-BOND";"TREASURY-TIPS")
You could also use "like" which allows you to use some basic regex:
where product like "TREASURY*"

substring extraction in HQL

There's a URL field in my Hive DB that is of string type with this specific pattern:
/Cats-g294078-o303631-Maine_Coon_and_Tabby.html
and I would like to extract the two Cat "types" near the end of the string, with the result being something like:
mainecoontabby
Basically, I'd like to only extract - as one lowercase string - the Cat "types" which are always separated by '_ and _', preceded by '-', and followed by '.html'.
Is there a simple way to do this in HQL? I know HQL has limited functionality, otherwise I'd be using regexp or substring or something like that.
Thanks,
Clark
HQL does have a substr function as cited here: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-StringFunctions
It returns the piece of a string starting at a value until the end (or for a particular length)
I'd also utilize the function locate to determine the location of the '-' and '_' in the URL.
As long as there are always three dashes and three underscores this should be pretty straight forward.
Might need case statements to determine number of dashes and underscores otherwise.
solution here...
LOWER(REGEXP_REPLACE(SUBSTRING(catString, LOCATE('-', catString, 19)+1), '(_to_)|(\.html)|_', ''))
Interestingly, the following did NOT work... JJFord3, any idea why?
LOWER(REGEXP_EXTRACT(SUBSTRING(FL.url, LOCATE('-', FL.url, 19)+1), '[^(_to_)|(\.html)|_]', 0))

Remove Special Characters from an Oracle String

From within an Oracle 11g database, using SQL, I need to remove the following sequence of special characters from a string, i.e.
~!##$%^&*()_+=\{}[]:”;’<,>./?
If any of these characters exist within a string, except for these two characters, which I DO NOT want removed, i.e.: "|" and "-" then I would like them completely removed.
For example:
From: 'ABC(D E+FGH?/IJK LMN~OP' To: 'ABCD EFGHIJK LMNOP' after removal of special characters.
I have tried this small test which works for this sample, i.e:
select regexp_replace('abc+de)fg','\+|\)') from dual
but is there a better means of using my sequence of special characters above without doing this string pattern of '\+|\)' for every special character using Oracle SQL?
You can replace anything other than letters and space with empty string
[^a-zA-Z ]
here is online demo
As per below comments
I still need to keep the following two special characters within my string, i.e. "|" and "-".
Just exclude more
[^a-zA-Z|-]
Note: hyphen - should be in the starting or ending or escaped like \- because it has special meaning in the Character class to define a range.
For more info read about Character Classes or Character Sets
Consider using this regex replacement instead:
REGEXP_REPLACE('abc+de)fg', '[~!##$%^&*()_+=\\{}[\]:”;’<,>.\/?]', '')
The replacement will match any character from your list.
Here is a regex demo!
The regex to match your sequence of special characters is:
[]~!##$%^&*()_+=\{}[:”;’<,>./?]+
I feel you still missed to escape all regex-special characters.
To achieve that, go iteratively:
build a test-tring and start to build up your regex-string character by character to see if it removes what you expect to be removed.
If the latest character does not work you have to escape it.
That should do the trick.
SELECT TRANSLATE('~!##$%sdv^&*()_+=\dsv{}[]:”;’<,>dsvsdd./?', '~!##$%^&*()_+=\{}[]:”;’<,>./?',' ')
FROM dual;
result:
TRANSLATE
-------------
sdvdsvdsvsdd
SQL> select translate('abc+de#fg-hq!m', 'a+-#!', etc.) from dual;
TRANSLATE(
----------
abcdefghqm