extracting special character from a string in oracle sql - sql

I need a query to remove all alphanumeric characters from a string and give me only special characters.
If string is '##45gr##3' query should give me '####'.

SELECT REGEXP_REPLACE('##45gr##3','[^[:punct:]'' '']', NULL) FROM dual;

The old-fashioned way, with a replace() call:
select translate(upper(txt)
, '.1234567890ABCDEFGHIJKLMNOPQRSTUVWXYZ'
, '.') as spec_txt
from t42
/
With a replace() solution is better to type all the characters we want to exclude. Yes that is the larger set (probably) but at least it's well-defined. We don't want to keep revising the code to handle new characters in the input string.
Obviously regex solution is more compact and who can deny the elegance of #venkatesh solution? However, evaluating regular expressions is more expensive than more focused SQL functions, so it's worth knowing alternative ways of doing things, for those cases when performance is an issue.

Everything written in comments is most probably very true (especially the 5th one that talks about exceptions, special cases etc.). My feeling is that Jörg knows more about regular expressions than I'll ever know.
Anyway, to cut a long story short, in its very simple appearance, regarding the question posted ("remove all numbers and letters"), something like this might work:
SQL> select regexp_replace('a##45gr##3$.', '[[:digit:]]+|[[:alpha:]]+', '') result
2 from dual;
RESULT
------
####$.
SQL>

Related

Splitting variable content in SQL

I have a variable in a stored procedure that contains a string of characters like
[Tag]MESSAGE[/Tag]
I need a way to get the MESSAGE part from within the tags.
Any help would be much appreciated
Note: I have tested it on Oracle RDBMS
A more reliable approach is to use REGEXP_REPLACE.
REGEXP_REPLACE(value, pattern)
Example
SELECT REGEXP_REPLACE(
'<Tag>Message</Tag>',
'\s*</?\w+((\s+\w+(\s*=\s*(".*?"|''.*?''|[^''">\s]+))?)+\s*|\s*)/?>\s*') FROM DUAL;
Just replace "<" with "[" if your tags are different
What you need is this:
SELECT SUBSTRING(ColumnName,CHARINDEX('html_tag',ColumnName)+LEN('html_tag'),CHARINDEX('html_close_tag',ColumnName)-LEN('html_close_tag')) FROM TableName
You'll require to change the html_tag and html_close_tag with your own HTML tag that you want to get rid of.
If the column contains only single tag, simple call of substring function should be enough. Otherwise there will always be some point where regular expression does not suffice since you fall into trap (see this legendary StackOverflow answer).

Pattern matching in Big query vs SSMS-Return strings which contain special characters or numerics

I'm a bit lost.
I've had a look at the documentation but I'm not sure if you can use LIKE and pattern match in Big Query the same as SSMS.
The code shown here works in SSMS but the results are not correct in Big Query, so was wondering if there was another way to do it.
WHERE column_name NOT LIKE '[a-Z]%'
I'm looking to return strings which contain special characters or numerics.
Use REGEXP_CONTAINS instead
where not regexp_contains(column_name, r'[a-zA-Z]')
Meantime, LIKE is also supported as a comparison operator

How to select values around .(dot) using sql

I am running below query in Teradata :
sel requesttext from dbc.tables
where tablename='old_employee_table'
Result:
alter table DB_NAME.employee_table,no fallback ;
I want to get below result using SQL:
DB_NAME.employee_table
Requesttext can be:
create set table DB_NAME.employee_table;
DB Name and table can occur anywhere in the result. Since .(dot) is joining them that's why i want to split with .(dot).
Basically I need sql which can result me surrounding values of .(dot)
I want DBName and Tablename in result.
I'm not a Teradata person, but this should work for both strings given so far, as long as teradata's regexp_substr() supports positive look-behind and positive look-ahead assertions (I might have the Teradata syntax wrong, so a little tweaking may be needed):
SELECT REGEXP_SUBSTR(requesttext, '(?<= )(\w+\.\w+)(?=[,$]?)', 1, 1)
FROM dbc.tables
WHERE tablename='old_employee_table'
See the regex101 example. Hopefully it translates to Teradata easily.
The regex looks for and returns the words either side of and including the period, when preceded by a space, and followed by an optional comma or the end of the line.
You could do this with either regexp_substr() or strtok().
As Jamie Zawinski said:
Some people, when confronted with a problem, think "I know, I'll use
regular expressions." Now they have two problems.
So I would go with the strtok() method. Also I'm lazy and regular expressions are hard.
Function strtok() takes three arguments:
The string being split
The delimiter to split the string
The number of the token to grab.
To get at the <database>.<table> from that string that is returned in your query, we can split by a space, grab the third token, then split that by a comma and grab the first token.
That would look like:
SELECT strtok(strtok(requestText,' ',3),',',1)
FROM dbc.tables
WHERE tablename='old_employee_table'

Using substr to trim string on Oracle

I want to trim a string to a specified length. If the string is shorter, I don't want to do anything. I found a function substr() which does the job. However there is nothing in the Oracle documentation what happens if the string is shorter, than maximal length.
For example this:
select substr('abc',1,5) from dual;
returns 'abc', which is what I need.
I'd like to ask if this is safe, because the function seems not to be defined for this usage. Is there a better way how to truncate?
It is totally ok, but if you want, you can use this query:
select substr('abc',1,least(5,length('abc'))) from dual;
This is an interesting question. Surprisingly, the documentation doesn't seem to cover this point explicitly.
I think what you are doing is quite safe. substr() is not going to "add" characters to the end of the string when the string is too short. I have depended on this behavior in many databases, including Oracle, over time. This is how similar functions work in other databases and most languages.
The one sort-of-exception would be when the original data type is a char() rather than varchar2() type. In this case, the function would return a string of the same type, so it might be padded with spaces. That, though, is a property of the type not really of the function.
If you want to be absolutely certain that you won't end up with trailing blanks by using SUBSTR alone (you won't, but sometimes it's comforting be really sure) you can use:
SELECT RTRIM(SUBSTR('abc',1,5)) FROM DUAL;
Share and enjoy.
It is better to use the below query
SELECT SUBSTR('abc',1,LEAST(5,LENGTH('abc'))) FROM DUAL;
Above query would either take the length of the string or the number 5 whichever is lower.

Redshift SQL - Extract numbers from string

In Amazon Redshift tables, I have a string column from which I need to extract numbers only out. For this currently I use
translate(stringfield, '0123456789'||stringfield, '0123456789')
I was trying out REPLACE function, but its not gonna be elegant.
Any thoughts with converting the string into ASCII first and then doing some operation to extract only number? Or any other alternatives.
It is hard here as Redshift do not support functions and is missing lot of traditional functions.
Edit:
Trying out the below, but it only returns 051-a92 where as I need 05192 as output. I am thinking of substring etc, but I only have regexp_substr available right now. How do I get rid of any characters in between
select REGEXP_SUBSTR('somestring-051-a92', '[0-9]+..[0-9]+', 1)
might be late but I was solving the same problem and finally came up with this
select REGEXP_replace('somestring-051-a92', '[a-z/-]', '')
alternatively, you can create a Python UDF now
Typically your inputs will conform to some sort of pattern that can be used to do the parsing using SUBSTRING() with CHARINDEX() { aka STRPOS(), POSITION() }.
E.g. find the first hyphen and the second hyphen and take the data between them.
If not (and assuming your character range is limited to ASCII) then your best bet would be to nest 26+ REPLACE() functions to remove all of the standard alpha characters (and any punctuation as well).
If you have multibyte characters in your data though then this is a non-starter.
Better method is to remove all the non-numeric values:
select REGEXP_replace('somestring-051-a92', '[^0-9]', '')
You can specify "any non digit" that includes non-printable, symbols, alpha, etc.
e.g., regexp_replace('brws--A*1','[\D]')
returns
"1"