How to remove non-numeric characters (except full stop "." ) from a string in amazon redshift - sql

I have been trying to figure out how to remove multiple non-numeric characters except full stop ("."), or return only the numeric characters with full stop (".") from a string. I've tried:
SELECT regexp_replace('~�$$$1$$#1633,123.60&&!!__!', '[^0-9]+', '')
This query returns following result : 1163312360
But I want the result as 11633123.60

Please try this:
The below regex_replace expression will replace all character which are not ("^") in the (range of 0-9) & "."
SELECT regexp_replace('ABC$$$%%11633123.60','([^0-9.])','') FROM DUAL;
It returns the expected output "11633123.60"

Related

Extract Specific Set of data from a String in Oracle

I have the string '1_A_B_C_D_E_1_2_3_4_5' and I am trying to extract the data 'A_B_C_D_E'. I am trying to remove the _1_2_3_4_5 & the 1_ portion from the string. Which is essentially the numeric portion in the string. any special characters after the last alphabet must also be removed. In this example the _ after the character E must also not be present.
and the Query I am trying is as below
SELECT
REGEXP_SUBSTR('1_A_B_C_D_E_1_2_3_4_5','[^0-9]+',1,1)
from dual
The Data I get from the above query is as below: -
_A_B_C_D_E_
I am trying to figure a way to remove the underscore towards the end. Any other way to approach this?
Assuming the "letters" come first and then the "digits", you could do something like this:
select regexp_substr('A_B_C_D_E_1_2_3_4_5','.*[A-Z]') from dual;
This will pull all the characters from the beginning of the string, up to the last upper-case letter in the string (.* is greedy, it will extend as far as possible while still allowing for one more upper-case letter to complete the match).
I have the string '1_A_B_C_D_E_1_2_3_4_5' and I am trying to extract the data 'A_B_C_D_E'
Use REGEXP_REPLACE:
SQL> SELECT trim(BOTH '_' FROM
2 (REGEXP_SUBSTR('1_A_B_C_D_E_1_2_3_4_5','[0-9]+', ''))) str
3 FROM dual;
STR
---------
A_B_C_D_E
How it works:
REGEXP_REPLACE will replace all numeric occurrences '[0-9]+' from the string. Alternatively, you could also use POSIX character class '[^[:digit:]]+'
TRIM BOTH '_' will remove any leading and lagging _ from the string.
Also using REGEXP_SUBSTR:
SELECT trim(BOTH '_' FROM
(REGEXP_SUBSTR('1_A_B_C_D_E_1_2_3_4_5','[^0-9]+'))) str
FROM dual;
STR
---------
A_B_C_D_E

Teradata substring out of bounds

I'm having issues figuring out the bounds between a substring. For example for the string 063016_shape_tea_cleanse__emshptea1_I want to substring out emshptea1, but it also has to work for the string 063016_shape_tea_cleanse__emshptea1_TESTDATA_HERE.
Currently I have:
sel SUBSTR('063016_shape_tea_cleanse__emshptea1_',POSITION('__' IN '063016_shape_tea_cleanse__emshptea1_')+2,
POSITION('_' IN SUBSTR('063016_shape_tea_cleanse__emshptea1_',POSITION('__' IN '063016_shape_tea_cleanse__emshptea1_') + 2,CHARACTER_LENGTH('063016_shape_tea_cleanse__emshptea1_') - (POSITION('__' IN '063016_shape_tea_cleanse__emshptea1_') + 2)))-1)
But that is erroring out due to it trying to substring 27 to -1.
You might use a regular expression, this will extract everything between __ and the following _ or end of string:
REGEXP_SUBSTR(col, '(?<=__).+?(?=(_|$))')
'(?<= )' is a look-behind, i.e search for previous characters without adding it to the result. Here: search for __
'.+' matches any character, one or multiple times. This would match until the end of the string ("greedy"), '?' ("lazy") prevents that.
'(?= )' is a look-ahead, i.e. search for following characters without adding it to the result.
( | ) The pipe splits an expression in multiple alternatives. Here either an underscore character or the end of the string $

Replace function doesn't work as expected

I'm having trouble figuring out why REPLACE() doesn't work correctly.
I'm getting a string formatted as:
RISHON_LEZION-CMTSDV4,Cable7/0/4/U1;RISHON_LEZION-CMTSDV4,Cable7/0/4/U2;RISHON_LEZION-CMTSDV4,Cable7/0/5/U0;.....
Up to 4000 characters .
Each spot of ; represent a new string(can be up to about 15 in one string). I'm splitting it by using REPLACE() - each occurence of ; replace with $ + go down a line + concat the entire string again (I have another part that is splitting down the string)
I think the length of the string is some how effecting the result, though I never heard replace has some kind of limitation about the length of the string.
SELECT REPLACE(HOT_ALERTKEY_PK, ';', '$' || CHR(13) || CHR(10) || HOT_ALERTKEY_PK || '$')
from (SELECT 'RISHON_LEZION-CMTSDV4,Cable7/0/3/U0;RISHON_LEZION-CMTSDV4,Cable7/0/3/U1;RISHON_LEZION-CMTSDV4,Cable7/0/3/U2;RISHON_LEZION-CMTSDV4,Cable7/0/4/U0;RISHON_LEZION-CMTSDV4,Cable7/0/4/U1;RISHON_LEZION-CMTSDV4,Cable7/0/4/U2;RISHON_LEZION-CMTSDV4,Cable7/0/5/U0;RISHON_LEZION-CMTSDV4,Cable7/0/5/U1;RISHON_LEZION-CMTSDV4,Cable7/0/5/U2;RISHON_LEZION-CMTSDV4,Cable7/0/7/U0;RISHON_LEZION-CMTSDV4,Cable7/0/7/U1;RISHON_LEZION-CMTSDV4,Cable7/0/7/U2;RISHON_LEZION-CMTSDV4,Cable7/0/9/U0;RISHON_LEZION-CMTSDV4,Cable7/0/9/U1;RISHON_LEZION-CMTSDV4,Cable7/0/9/U2' as hot_alertkey_pk
FROM dual)
This for some reason result in splitting the string correctly, up to cable7/0/5/U0; , and stops. If I remove one or more parts from the start of the string (up to the semicolumn is each part) then I'm getting it up to the next cables, according to how many I remove from the beggining.
Why is this happening ?
Thanks in advance.
If you wrap your sample input string within to_clob() in the inner query, and you wrap the resulting string within length() in the outer query, you will find that the result is 8127 characters. This answers your question, but only partially.
I am not sure why replace doesn't throw an error, or perhaps just truncate the result at 4000 characters. I got exactly the same result as you did in Oracle 11.2, with the result chopped off after 3503 characters. I just looked quickly at the Oracle documentation for replace() and it doesn't say what the behavior should be if the input is VARCHAR2 but the output is more than 4000 characters. It looks as though it performed as many substitutions as it could and then it stopped (the next substitution would have gone above 4000 characters).

Remove last x characters until a specific character

I got this string /uk-en/contact-us/frequently-asked-questions/your-trip/there-wi-fi-access-in-the-eurostar-terminals-and-board-your-trains and I need to get just the last part of the URL (until the last /).
Then I want to replace '-' with a space. The strings are not with the same number of characters.
How can I do?
Thank you!
Solution using BigQuery functions:
select regexp_replace(last(split(x, "/")), "-", " ") from
(select
"/uk-en/contact-us/frequently-asked-questions/your-trip/there-wi-fi-access-in-the-eurostar-terminals-and-board-your-trains"
as x)
Here is what I tried in SQL Server
DECLARE #s VARCHAR(max)= '/uk-en/contact-us/frequently-asked-questions/your-trip/there-wi-fi-access-in-the-eurostar-terminals-and-board-your-trains'
SELECT REVERSE(SUBSTRING(REVERSE(#s),CHARINDEX('/',REVERSE(#s)),LEN(REVERSE(#s))))+REVERSE(REPLACE(SUBSTRING(REVERSE(#s),1,CHARINDEX('/',REVERSE(#s))-1),'-',' '))
Sorry this was for SQL Server
did you try using split in big query
SPLIT('str' [, 'delimiter']) Returns a set of substrings as a repeated string. If delimiter is specified, the SPLIT function breaks str into substrings, using delimiter as the delimiter.

Search an Oracle clob for special characters that are not escaped

Is it possible to run a query that can search an Oracle clob for any record that contains an ampersand character where the word in which the character is located in is not one of any of the following (or possible any escape code):
& - &
< - <
> - >
" - "
' - &apos;
I want to extract 5 character before the ampersand and 5 characters after the ampersand so i can see the actual value.
Basically i want to search for any record that contains those fields and replace it with the escape code.
At the moment i am doing something like this:
Select * from articles
where dbms_lob.instr(article_summary , '&amp' ) = 0 and dbms_lob.instr(article_summary , '&' )
Update
If i was to use a regular expression, how would i specify it if i want to retrieve all fields where the value is & followed by any character other than 'a'?
You can use DBMS_XMLGEN.CONVERT for this. The second parameter is optional and if left out will escape the the XML special characters.
select DBMS_XMLGEN.CONVERT(article_summary)
from articles;
But, if article summary contains a mixture of escaped and unescaped characters, then this will give wrong result. Easiest way to solve it, is to unescape the characters first and then escape it.
select DBMS_XMLGEN.CONVERT(
DBMS_XMLGEN.CONVERT(article_summary,1) --1 as parameter does unescaping
)
from articles;