Regex remove all occurrences of multiple characters in a string - sql

In my PostgreSQL I want to replace all characters (;<>) occurrences in a string.
My query:
update table_name set text = regexp_replace(text, '/[(;<>)]+/g', '');
I think my regexp is wrong. Can anyone help me out with it?

Use the much faster translate() for this simple case:
UPDATE tbl SET text = translate(text, '(;<>)', '');
Every character in the second parameter that has no counterpart in the third parameter is replaced with nothing.
The regular expression solution could look like this:
regexp_replace(text, '[(;<>)]', '', 'g');
Essential element is the 4th parameter 'g' to replace "globally" instead of just the first match. The second parameter is a character class.
You were on the right track, just a matter of syntax for regexp_replace().
Hint on UPDATE
If you don't expect all rows to be changed, I would strongly advise to adapt your UPDATE statement:
UPDATE tbl
SET text = translate(text, '(;<>)', '')
WHERE text <> translate(text, '(;<>)', '');
This way you avoid (expensive) empty updates. (NULL is covered automatically in this particular case.)

Related

Postgres SQL regexp_replace replace all number

I need some help with the next. I have a field text in SQL, this record a list of times sepparates with '|'. For example
'14613|15474|3832|148|5236|5348|1055|524' Each value is a time in milliseconds. This field could any length, for example is perfect correct '3215|2654' or '4565' (only 1 value). I need get this field and replace all number with -1000 value.
So '14613|15474|3832|148|5236|5348|1055|524' will be '-1000|-1000|-1000|-1000|-1000|-1000|-1000|-1000'
Or '3215|2654' => '-1000|-1000' Or '4565' => '-1000'.
I try use regexp_replace(times_field,'[[:digit:]]','-1000','g') but it replace each digit, not the complete number, so in this example:
'3215|2654' than must be '-1000|-1000', i get:
'-1000-1000-1000-1000|-1000-1000-1000-1000', I try with other combinations and more options of regexp but i'm done.
Please need your help, thanks!!!.
We can try using REGEXP_REPLACE here:
UPDATE yourTable
SET times_field = REGEXP_REPLACE(times_field, '\y[0-9]+\y', '-1000', 'g');
If instead you don't really want to alter your data but rather just view your data this way, then use a select:
SELECT
times_field,
REGEXP_REPLACE(times_field, '\y[0-9]+\y', '-1000', 'g') AS times_field_replace
FROM yourTable;
Note that in either case we pass g as the fourtb parameter to REGEXP_REPLACE to do a global replacement of all pipe separated numbers.
[[:digit:]] - matches a digit [0-9]
+ Quantifier - matches between one and unlimited times, as many times as possible
your regexp must look like
regexp_replace(times_field,'[[:digit:]]+','-1000','g')

How can I remove characters in a string after a specific special character (~) in snowflake sql?

I am using Snowflake SQL. I would like to remove characters from a string after a special character ~. How can I do that?
here is the whole scenario. Let me explain. I do have a string like 'CK#123456~fndkjfgdjkg'. Now, i want only the number after #.And not anything after ~. This is number length varies for that field value. It might be 1 or 5 or 3. And i want to add the condition in where class where this number is equal to check_num from other table after joining. I am trying REGEXP_SUBSTR(A.SRC_TXT, '(?<=CK#)(.+?\b)') = C.CHK_NUM in the where condition. I am getting the error as 'No repititive argument after ?'
You can use a regex for this
-- To remove just the character after a ~
select regexp_replace('fo~o bar','~.', '');
-- returns 'fo bar'
--If you want to keep the ~
select regexp_replace('fo~o bar','~.', '~');
-- returns 'fo~ bar'
--If you want to remove everything after the ~
select regexp_replace('fo~o bar','~.*', '');
--returns 'fo'
If you need to remove other specific character sets after a ~, you can probably do this with a slightly more complicated regex, but I'd need examples of your desired input/output to help with that.
EDIT for updated question
This regex replace should get what you need.
select regexp_replace('CK#123456~fndkjfgdjkg','CK#(\\d*)~.*', '\\1');
-- returns 123456
(\\d*) gets ANY number of digits in a row, and the \\1 causes it to replace the match with what was in the first set of parenthesis, which is your list of digits. the CK# and ~.* are there to make sure the whole string gets matched and replaced.
If the CK# can vary as well, you can use .*? like this.
select regexp_replace('ABCD123HI#123456~fndkjfgdjkg','.*?#(\\d*)~.*', '\\1')
-- returns 123456
I'd probably do something like the following, easy enough but not as cool as RegEx type of functions.
set my_string='fooo~12345';
set search_for_me = '~';
SELECT SUBSTR($my_string, 1, DECODE(position($search_for_me, $my_string), 0, length($my_string), position($search_for_me, $my_string)));
I hope this helps...Rich
It looks like lookahead and lookbehinds are not supported in REGEXP functions, they seem to work in the PATTERN clause of a LIST command. Snowflake documentation makes no mention either way of lookahead or lookbehinds.
In your example:
It seems that the query engine is looking for that repeating argument, where you are attempting a lookbehind
You have not specified what you wanted extracted. You have two capture groups, but in this scenario everything would be returned
Since you are looking to remove everything after ~ you have a delimiter, why not use it in your REGEXP_SUBSTR function?
Try the following:
SELECT $1,REGEXP_SUBSTR($1,'\\w+#(.+?)~',1,1,'is',1)
FROM VALUES
('CK#123456~fndkjfgdjkg')
,('QH#128fklj924~fndkjfgdjkg')
;
This looks for:
One or more word characters
Followed by #
Capturing one or more characters upto and not including ~
Returns the characters within the capture group
You can change the .+? to \\d+? to make sure the pattern is only digits. Backslashes must be escaped with a backslash.
The descriptions for each argument of the function can be found here:
https://docs.snowflake.net/manuals/sql-reference/functions/regexp_substr.html
You could check this!!
select substr('CK#123456~fndkjfgdjkg',4,6) from dual;
OUTPUT
123456
https://docs.snowflake.net/manuals/sql-reference/functions/substr.html

Replacing characters in the oracle database value range

I have a database in an oracle, I filled in the "Number" fields with numbers starting from "A14602727" to "A14603000" but it turned out that I accidentally typed the symbol A (in Ukrainian) instead of A (in English). And now I when find the command:
select number from test where number 'А14602727'
...find me nothing. Is it possible somehow with the help of the command to replace all numbers from "A14602727" (Ukrainian) to "A14602727" (English) ?
I will be grateful for your help!)
You can use following trick to convert any character to its base character using accent-insesitive binary sorting.
select your_col,
utl_raw.cast_to_varchar2(nlssort(your_col, 'nls_sort=binary_ai')) converted_col
from your_table
Use it in update statement accordingly.
Cheers!!
You could use regexp_replace():
update test set number = regexp_replace(number, '^Ä', 'A');
Regexp '^Ä' represents character 'Ä' at the beginning of the string. I used 'Ä' to represent the A in Ukrainian: replace this with the correct character that you want to replace.
This can also be done, probably more efficiently, with substr and like:
update test set number = 'A' || substr(number, 2) where number like 'Ä%';
Use the simple Oracle REPLACE function:
UPDATE test
SET number = REPLACE(number, 'A', 'A');
Replace is taking 3 parameters:
String in which you will search for a string to replace (The number column)
The string you are searchin gto replace (Ukrainian A)
The string you will replace it with (English A)
Here you have a DEMO example.
Also please note that the query in your question:
select number from test where number 'А14602727';
Is not valid. It should be something like this:
select number from test where number like 'А14602727';
or like this:
select number from test where number = 'А14602727';
So first do check that!
Cheers!

Match Character Whether or Not It Exists in Like Statement

I need a like expression that will match a character whether or not it exists. It needs to match the following values:
..."value": "123456"...
..."value": "123456"...
"...value":"123456"...
This like statement will almost work: LIKE '%value":%"123456"%'
But there are values like this one that would also match, but I don't want returned:
..."value":"99999", "other":"123456"...
A regex expression to do what I'm looking to do is 'value": *?"123456"'. I need to do this in SQL Server 2008 and I don't believe there is good regex support in that version. How can I match using a like statement?
Remove the whitespace in your compare with REPLACE():
WHERE REPLACE(column,' ','') LIKE '%"value":"123456"%'
May need a double replace for tabs:
REPLACE(REPLACE(column,' ',''),' ','')
I don't think you can with the like operator. You could exclude ones you could match, like if you want to make sure it just doesn't contain other:
[field] LIKE '%value":%"123456"%` AND [field] NOT LIKE '%"other"%'
Otherwise I think you'd have to do some processing on the string. You could write a UDF to take the string and parse it to find the value for 'value' and compare based on that:
dbo.fn_GetValue([field], 'value') = '123456'
The function could find the index of '"' + #name + '"', find the next index of a quote, and the one after that, then get the string between those two quotes and return it.

SQL String contains ONLY

I have a table with a field that denotes whether the data in that row is valid or not. This field contains a string of undetermined length. I need a query that will only pull out rows where all the characters in this field are N. Some possible examples of this field.
NNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNN
NNNNNEEEENNNNNNNNNNNN
NNNNNOOOOOEEEENNNNNNNNNNNN
Any suggestions on a postcard please.
Many thanks
This should do the trick:
SELECT Field
FROM YourTable
WHERE Field NOT LIKE '%[^N]%' AND Field <> ''
What it's doing is a wildcard search, broken down:
The LIKE will find records where the field contains characters other than N in the field. So, we apply a NOT to that as we're only interested in records that do not contain characters other than N. Plus a condition to filter out blank values.
SELECT *
FROM mytable
WHERE field NOT LIKE '%[^N]%'
I don't know which SQL dialect you are using. For example Oracle has several functions you may use. With oracle you could use condition like :
WHERE LTRIM(field, 'N') = ''
The idea is to trim out all N's and see if the result is empty string. If you don't have LTRIM, check if you have some kind of TRANSLATE or REPLACE function to do the same thing.
Another way to do it could be to pick length of your field and then construct comparator value by padding empty string with N. Perhaps something like:
WHERE field = RPAD('', field, 'N)
Oracle pads that empty string with N's and picks number of pad characters from length of the second argument. Perhaps this works too:
WHERE field = RPAD('', LENGTH(field), 'N)
I haven't tested those, but hopefully that give you some ideas how to solve your problem. I guess that many of these solutions have bad performance if you have lot of rows and you don't have other WHERE conditions to select proper index.