Replace function doesn't work as expected - sql

I'm having trouble figuring out why REPLACE() doesn't work correctly.
I'm getting a string formatted as:
RISHON_LEZION-CMTSDV4,Cable7/0/4/U1;RISHON_LEZION-CMTSDV4,Cable7/0/4/U2;RISHON_LEZION-CMTSDV4,Cable7/0/5/U0;.....
Up to 4000 characters .
Each spot of ; represent a new string(can be up to about 15 in one string). I'm splitting it by using REPLACE() - each occurence of ; replace with $ + go down a line + concat the entire string again (I have another part that is splitting down the string)
I think the length of the string is some how effecting the result, though I never heard replace has some kind of limitation about the length of the string.
SELECT REPLACE(HOT_ALERTKEY_PK, ';', '$' || CHR(13) || CHR(10) || HOT_ALERTKEY_PK || '$')
from (SELECT 'RISHON_LEZION-CMTSDV4,Cable7/0/3/U0;RISHON_LEZION-CMTSDV4,Cable7/0/3/U1;RISHON_LEZION-CMTSDV4,Cable7/0/3/U2;RISHON_LEZION-CMTSDV4,Cable7/0/4/U0;RISHON_LEZION-CMTSDV4,Cable7/0/4/U1;RISHON_LEZION-CMTSDV4,Cable7/0/4/U2;RISHON_LEZION-CMTSDV4,Cable7/0/5/U0;RISHON_LEZION-CMTSDV4,Cable7/0/5/U1;RISHON_LEZION-CMTSDV4,Cable7/0/5/U2;RISHON_LEZION-CMTSDV4,Cable7/0/7/U0;RISHON_LEZION-CMTSDV4,Cable7/0/7/U1;RISHON_LEZION-CMTSDV4,Cable7/0/7/U2;RISHON_LEZION-CMTSDV4,Cable7/0/9/U0;RISHON_LEZION-CMTSDV4,Cable7/0/9/U1;RISHON_LEZION-CMTSDV4,Cable7/0/9/U2' as hot_alertkey_pk
FROM dual)
This for some reason result in splitting the string correctly, up to cable7/0/5/U0; , and stops. If I remove one or more parts from the start of the string (up to the semicolumn is each part) then I'm getting it up to the next cables, according to how many I remove from the beggining.
Why is this happening ?
Thanks in advance.

If you wrap your sample input string within to_clob() in the inner query, and you wrap the resulting string within length() in the outer query, you will find that the result is 8127 characters. This answers your question, but only partially.
I am not sure why replace doesn't throw an error, or perhaps just truncate the result at 4000 characters. I got exactly the same result as you did in Oracle 11.2, with the result chopped off after 3503 characters. I just looked quickly at the Oracle documentation for replace() and it doesn't say what the behavior should be if the input is VARCHAR2 but the output is more than 4000 characters. It looks as though it performed as many substitutions as it could and then it stopped (the next substitution would have gone above 4000 characters).

Related

How can I remove characters in a string after a specific special character (~) in snowflake sql?

I am using Snowflake SQL. I would like to remove characters from a string after a special character ~. How can I do that?
here is the whole scenario. Let me explain. I do have a string like 'CK#123456~fndkjfgdjkg'. Now, i want only the number after #.And not anything after ~. This is number length varies for that field value. It might be 1 or 5 or 3. And i want to add the condition in where class where this number is equal to check_num from other table after joining. I am trying REGEXP_SUBSTR(A.SRC_TXT, '(?<=CK#)(.+?\b)') = C.CHK_NUM in the where condition. I am getting the error as 'No repititive argument after ?'
You can use a regex for this
-- To remove just the character after a ~
select regexp_replace('fo~o bar','~.', '');
-- returns 'fo bar'
--If you want to keep the ~
select regexp_replace('fo~o bar','~.', '~');
-- returns 'fo~ bar'
--If you want to remove everything after the ~
select regexp_replace('fo~o bar','~.*', '');
--returns 'fo'
If you need to remove other specific character sets after a ~, you can probably do this with a slightly more complicated regex, but I'd need examples of your desired input/output to help with that.
EDIT for updated question
This regex replace should get what you need.
select regexp_replace('CK#123456~fndkjfgdjkg','CK#(\\d*)~.*', '\\1');
-- returns 123456
(\\d*) gets ANY number of digits in a row, and the \\1 causes it to replace the match with what was in the first set of parenthesis, which is your list of digits. the CK# and ~.* are there to make sure the whole string gets matched and replaced.
If the CK# can vary as well, you can use .*? like this.
select regexp_replace('ABCD123HI#123456~fndkjfgdjkg','.*?#(\\d*)~.*', '\\1')
-- returns 123456
I'd probably do something like the following, easy enough but not as cool as RegEx type of functions.
set my_string='fooo~12345';
set search_for_me = '~';
SELECT SUBSTR($my_string, 1, DECODE(position($search_for_me, $my_string), 0, length($my_string), position($search_for_me, $my_string)));
I hope this helps...Rich
It looks like lookahead and lookbehinds are not supported in REGEXP functions, they seem to work in the PATTERN clause of a LIST command. Snowflake documentation makes no mention either way of lookahead or lookbehinds.
In your example:
It seems that the query engine is looking for that repeating argument, where you are attempting a lookbehind
You have not specified what you wanted extracted. You have two capture groups, but in this scenario everything would be returned
Since you are looking to remove everything after ~ you have a delimiter, why not use it in your REGEXP_SUBSTR function?
Try the following:
SELECT $1,REGEXP_SUBSTR($1,'\\w+#(.+?)~',1,1,'is',1)
FROM VALUES
('CK#123456~fndkjfgdjkg')
,('QH#128fklj924~fndkjfgdjkg')
;
This looks for:
One or more word characters
Followed by #
Capturing one or more characters upto and not including ~
Returns the characters within the capture group
You can change the .+? to \\d+? to make sure the pattern is only digits. Backslashes must be escaped with a backslash.
The descriptions for each argument of the function can be found here:
https://docs.snowflake.net/manuals/sql-reference/functions/regexp_substr.html
You could check this!!
select substr('CK#123456~fndkjfgdjkg',4,6) from dual;
OUTPUT
123456
https://docs.snowflake.net/manuals/sql-reference/functions/substr.html

Get only Number

How to ignore special characters and get only number with the below input as string.
Input: '33-01-616-000'
Output should be 3301616000
Use the REPLACE() function to remove the - characters.
REPLACE(columnname, '-', '')
Or if there can be other non-numeric characters, you can use REGEXP_REPLACE() to remove anything that isn't a number.
REGEXP_REPLACE(columnname, '\D', '')
Standard string functions (like REPLACE, TRANSLATE etc.) are often much faster (one order of magnitude faster) than their regular expression counterparts. Of course, this is only important if you have a lot of data to process, and/or if you don't have that much data but you must process it very frequently.
Here is one way to use TRANSLATE for this problem even if you don't know ahead of time what other characters there may be in the string - besides digits:
TRANSLATE(columnname, '0123456789' || columnname, '0123456789')
This will map 0 to 0, 1 to 1, etc. - and all other characters in the input string columnname to nothing (so they will be simply removed). Note that in the TRANSLATE mapping, only the first occurrence of a character in the second argument matters - any additional mapping (due to the appearance of the same character in the second argument more than once) is ignored.
You can also use REGEXP_REPLACE function. Try code below,
SELECT REGEXP_REPLACE('33-01-61ASDF6-0**(98)00[],./123', '([^[:digit:]])', NULL)
FROM DUAL;
SELECT regexp_replace('33-01-616-000','[^0-9]') digits_only FROM dual;
/

Comma inside like query fails to return any result

Using Oracle db,
Select name from name_table where name like 'abc%';
returns one row with value "abc, cd" but when I do a select query with a comma before % in my like query, it fails to return any value.
Select name from name_table where name like 'abc,%';
returns no row. How can I handle a comma before % in the like query?
Example:
Database has "Sam, Smith" in the name column when the like has "Sam%" it returns one row, when i do "Sam,%" it doesn't return any row
NOT AN ANSWER but posting it as one since I can't format in a comment.
Look at this and use DUMP() on your own machine... see if this helps.
SQL> select dump('Smith, Stan') from dual;
DUMP('SMITH,STAN')
-----------------------------------------------------
Typ=96 Len=11: 83,109,105,116,104,44,32,83,116,97,110
If you count, the string is 11 characters (including the comma and the space). The comma is character 44, and the space is character 32. If you look at YOUR string and you don't see 44 where the comma should be, you will know that's the problem. You could then let us know what you see there (just for that character, I understand posting "Leno, Jay" would be a violation of privacy).
Also, make sure you don't have any extra characters (perhaps non-printable ones!) right before the comma. Just compare the two strings you are using as inputs and see where the differences may be.

Teradata substring out of bounds

I'm having issues figuring out the bounds between a substring. For example for the string 063016_shape_tea_cleanse__emshptea1_I want to substring out emshptea1, but it also has to work for the string 063016_shape_tea_cleanse__emshptea1_TESTDATA_HERE.
Currently I have:
sel SUBSTR('063016_shape_tea_cleanse__emshptea1_',POSITION('__' IN '063016_shape_tea_cleanse__emshptea1_')+2,
POSITION('_' IN SUBSTR('063016_shape_tea_cleanse__emshptea1_',POSITION('__' IN '063016_shape_tea_cleanse__emshptea1_') + 2,CHARACTER_LENGTH('063016_shape_tea_cleanse__emshptea1_') - (POSITION('__' IN '063016_shape_tea_cleanse__emshptea1_') + 2)))-1)
But that is erroring out due to it trying to substring 27 to -1.
You might use a regular expression, this will extract everything between __ and the following _ or end of string:
REGEXP_SUBSTR(col, '(?<=__).+?(?=(_|$))')
'(?<= )' is a look-behind, i.e search for previous characters without adding it to the result. Here: search for __
'.+' matches any character, one or multiple times. This would match until the end of the string ("greedy"), '?' ("lazy") prevents that.
'(?= )' is a look-ahead, i.e. search for following characters without adding it to the result.
( | ) The pipe splits an expression in multiple alternatives. Here either an underscore character or the end of the string $

Oracle: remove first 4 characters from a string

So I want to remove the first 4 characters from a string in oracle. Those characters can be different every time.
In my case I need to take away the first 4 characters of an IBAN and put them at the end of the string. I got the part of putting them to the end of the string but I can't get the first 4 characters to be removed. Every solution I find on the internet removes specified characters, not characters from a certain position in the string (1 to 4).
I used the code below to get the first 4 characters to the end of the string and wanted to try something similar for removing them at the front but without success.
SELECT SUBSTR(iban_nummer, 1, 4) INTO iban_substring FROM dual;
iban_nummer := iban_nummer || iban_substring;
See the docs:
substring_length ...
When you do not specify a value for this argument, then the function returns all characters to the end of string. When you specify
a value that is less than 1, the function returns NA.
So iban_nummer := substr(iban_nummer, 5) || substr(iban_nummer, 1,4) should work. The first part selects all characters beginning from the 5th, the second character numbers 1..4.
update table_name set col_name=substr(col_name,5);
try regexp, like:
SELECT regexp_replace(t.iban_nummer,'(.{4})(.*)','\2\1') FROM t;
Alternative way using regexp :
SELECT regexp_replace(t.iban_nummer,'^.{4}(.*)','\2\1') FROM dual;