Escaping a single quote in Oracle regex query - sql

This is really starting to hurt!
I'm attempting to write a query in Oracle developer using a regex condition
My objective is to find all last names that contain charachters not commonly contained in names (non-alpha, spaces, hyphens and single quotes)
i.e.
I need to find
J00ls
McDonald "Macca"
Smithy (Smith)
and NOT find
Smith
Mckenzie-Smith
El Hassan
O'Dowd
My present query is
select * from dm_name
WHERE regexp_like(last_name, '([^A-Za-z -])')
and batch_id = 'ATEST';
which excludes everything expected except the single quote. When it comes to putting the single quote character, the Oracvel SQL Develoepr parser takes it as a literal.
I've tried:
\' -- but got a "missing right parenthesis" error
||chr(39)|| -- but the search returned nothing
'' -- negated the previous character in the matching group e.g. '([^A-Za-z -''])' made names with '-' return.
I'd appreciate anything you could offer.

Just double the single quote to escape your quote.
So
select *
from dm_name
where regexp_like(last_name, '[^A-Za-z ''-]')
and batch_id = 'ATEST'
See also this sqlfiddle. Note, I tried a similar query in SQL developer and that worked as well as the fiddle.
Note also, for this to work the - character has to be the last character in the group as otherwise it tries to find the group SPACE to ' rather than the character -.

The following works:
select *
from dm_name
WHERE regexp_like(last_name, '([^A-Za-z ''-])');
See this SQLFiddle.
Whether SQL Developer will like it or not is something I cannot attest to as I don't have that product installed.
Share and enjoy.

Related

How to find a row where col have special characters or numbers (except hyphen,apostrophe and space) in Oracle SQL

I need to find rows where col have special characters or numbers (except hyphen,apostrophe and space) in Oracle SQL.
I am doing like below:
SELECT *
FROM test
WHERE Name_test LIKE '%[^A-Za-z _]%'
But It is not working and I also need to exclude any apostrophe.
Kindly help.
If you need to find all rows where column have ONLY numbers and special characters (and you can specify all of required special characters):
SELECT *
FROM test
WHERE regexp_like(Name_test, q['^[0-9'%##]+$]')
as you can see you just need to add your special characters after 0-9.
^ - start
$ - end
About format q'[SOMETHING]' please see TEXT LITERALS here: https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/Literals.html#GUID-1824CBAA-6E16-4921-B2A6-112FB02248DA
If you need to find all rows where column have no alpha-characters:
SELECT *
FROM test
WHERE regexp_like(Name_test, '^[^a-zA-Z]*$');
or
SELECT *
FROM test
WHERE regexp_like(Name_test, '^\W*$');
about \W - please see "Table 8-5 PERL-Influenced Operators in Oracle SQL Regular Expressions" here:
https://docs.oracle.com/database/121/ADFNS/adfns_regexp.htm#ADFNS235
I need to find rows where col have special characters or numbers (except hyphen, apostrophe and space [and presumably single quotes]) in Oracle SQL.
You can use double single quotes to put a single quote in:
WHERE Name_test LIKE '%[^-A-Za-z _'']%'
However, this is not Oracle syntax. If the above works, then I would guess you are using SQL Server. In Oracle:
WHERE REGEXP_LIKE(Name_test, '[^A-Za-z _'']')

How can I remove characters in a string after a specific special character (~) in snowflake sql?

I am using Snowflake SQL. I would like to remove characters from a string after a special character ~. How can I do that?
here is the whole scenario. Let me explain. I do have a string like 'CK#123456~fndkjfgdjkg'. Now, i want only the number after #.And not anything after ~. This is number length varies for that field value. It might be 1 or 5 or 3. And i want to add the condition in where class where this number is equal to check_num from other table after joining. I am trying REGEXP_SUBSTR(A.SRC_TXT, '(?<=CK#)(.+?\b)') = C.CHK_NUM in the where condition. I am getting the error as 'No repititive argument after ?'
You can use a regex for this
-- To remove just the character after a ~
select regexp_replace('fo~o bar','~.', '');
-- returns 'fo bar'
--If you want to keep the ~
select regexp_replace('fo~o bar','~.', '~');
-- returns 'fo~ bar'
--If you want to remove everything after the ~
select regexp_replace('fo~o bar','~.*', '');
--returns 'fo'
If you need to remove other specific character sets after a ~, you can probably do this with a slightly more complicated regex, but I'd need examples of your desired input/output to help with that.
EDIT for updated question
This regex replace should get what you need.
select regexp_replace('CK#123456~fndkjfgdjkg','CK#(\\d*)~.*', '\\1');
-- returns 123456
(\\d*) gets ANY number of digits in a row, and the \\1 causes it to replace the match with what was in the first set of parenthesis, which is your list of digits. the CK# and ~.* are there to make sure the whole string gets matched and replaced.
If the CK# can vary as well, you can use .*? like this.
select regexp_replace('ABCD123HI#123456~fndkjfgdjkg','.*?#(\\d*)~.*', '\\1')
-- returns 123456
I'd probably do something like the following, easy enough but not as cool as RegEx type of functions.
set my_string='fooo~12345';
set search_for_me = '~';
SELECT SUBSTR($my_string, 1, DECODE(position($search_for_me, $my_string), 0, length($my_string), position($search_for_me, $my_string)));
I hope this helps...Rich
It looks like lookahead and lookbehinds are not supported in REGEXP functions, they seem to work in the PATTERN clause of a LIST command. Snowflake documentation makes no mention either way of lookahead or lookbehinds.
In your example:
It seems that the query engine is looking for that repeating argument, where you are attempting a lookbehind
You have not specified what you wanted extracted. You have two capture groups, but in this scenario everything would be returned
Since you are looking to remove everything after ~ you have a delimiter, why not use it in your REGEXP_SUBSTR function?
Try the following:
SELECT $1,REGEXP_SUBSTR($1,'\\w+#(.+?)~',1,1,'is',1)
FROM VALUES
('CK#123456~fndkjfgdjkg')
,('QH#128fklj924~fndkjfgdjkg')
;
This looks for:
One or more word characters
Followed by #
Capturing one or more characters upto and not including ~
Returns the characters within the capture group
You can change the .+? to \\d+? to make sure the pattern is only digits. Backslashes must be escaped with a backslash.
The descriptions for each argument of the function can be found here:
https://docs.snowflake.net/manuals/sql-reference/functions/regexp_substr.html
You could check this!!
select substr('CK#123456~fndkjfgdjkg',4,6) from dual;
OUTPUT
123456
https://docs.snowflake.net/manuals/sql-reference/functions/substr.html

Find phone numbers with unexpected characters using SQL

I posted the same question below for SQL in Oracle here and was provided the SQL info within that works.
However, I now need to perform the same in a DB2 database and if I attempt to run the same SQL it errors out.
I need to find rows where the phone number field contains unexpected characters.
Most of the values in this field look like:
123456-7890
This is expected. However, we are also seeing character values in this field such as * and #.
I want to find all rows where these unexpected character values exist.
Expected:
Numbers are expected
Hyphen with numbers is expected (hyphen alone is not)
NULL is expected
Empty is expected
This SQL works in Oracle:
...
WHERE regexp_like(phone_num, '[^ 0123456789-]|^-|-$')
When using the same SQL above in DB2, the statement errors out.
I found it easiest to answer your question by phrasing a regex which matches the positive cases. Then, we can just use NOT to find the negative cases. DB2 supports a REGEXP_LIKE function:
SELECT *
FROM yourTable
WHERE
NOT REGEXP_LIKE(phone_num, '^[0-9]+(-?[0-9]+)*$') AND
COALESCE(phone_num, '') <> '';
Here is a demo of the regex:
Demo
For newer version of db2, regexp is the way to go. If you are on an older version (perhaps why you get an error), you can replace all accepted chars with '' and check if the result is an empty string. Can't check right now, but from memory, it would be
WHERE TRANSLATE(phone_num, '', '0123456789-')<>''
EDIT:
For what it's worth your regexp works for V11 so you probably have an older version of Db2. Example of translate and regexp side by side:
]$ db2 "with t(s) as ( values '123456-7890', '12345*-7890' )
select s, 'regexp' as method from t
where regexp_like(s, '[^ 0123456789-]|^-|-$')
union all
select s, 'translate' as method
from t where TRANSLATE(s, '', '0123456789-')<>''"
S METHOD
----------- ---------
12345*-7890 translate
12345*-7890 regexp
2 record(s) selected.

What is this Oracle regexp matching in this production code?

Here's the code that is in production:
dynamic_sql := q'[ with cte as
select user_id,
user_name
from user_table
where regexp_like (bizz_buzz,'^[^Z][^Y6]]' || q'[') AND
user_code not in ('A','E','I')
order by 1]';
Start at the beginning and search bizz_buzz
Match any one character that is NOT Z
Match any two characters that are not Y6
What's the ']' after the 6?
Then what?
I think that StackOverflow's formatting is causing some of the confusion in the answers. Oracle has a syntax for a string literal, q'[...]', which means that the ... portion is to be interpreted exactly as-is; so for instance it can include single quotes without having to escape each one individually.
But the code formatting here doesn't understand that syntax, so it is treating each single-quote as a string delimiter, which makes the result look different that how Oracle really sees it.
The expression is concatenating two such string literals together. (I'm not sure why - it looks like it would be possible to write this as a single string literal with no issues.) As pointed out in another answer/comment, the resulting SQL string is actually:
with cte as
select user_id,
user_name
from user_table
where regexp_like (bizz_buzz,'^[^Z][^Y6]') AND
user_code not in ('A','E','I')
order by 1
And also as pointed out in another answer, the [^Y6] portion of the regex matches a single character, not two. So this expression should simply match any string whose first character is not 'Z' and whose second character is neither 'Y' nor '6'.
When not in couples ] means... Well... Itself:
^[^Z][^Y6]]/
^ assert position at start of the string
[^Z] match a single character not present in the list below
Z the literal character Z (case sensitive)
[^Y6] match a single character not present in the list below
Y6 a single character in the list Y6 literally (case sensitive)
] matches the character ] literally
Start at the beginning and search bizz_buzz
Match any one character that is NOT Z
Match any two one characters that is not Y or 6
What's the ']' after the 6? it's a ]
I'm afraid I have to post this here as the comment section is inappropriate for the formatting required. After your edit above that shows the entire statement, I ran this to see what the string ends up being:
select q'[ with cte as
select user_id,
user_name
from user_table
where regexp_like (bizz_buzz,'^[^Z][^Y6]]' || q'[') AND
user_code not in ('A','E','I')
order by 1]' txt
from dual;
It ended up yielding this:
with cte as
select user_id,
user_name
from user_table
where regexp_like (bizz_buzz,'^[^Z][^Y6]') AND
user_code not in ('A','E','I')
order by 1
It is apparent now that the closing bracket and quote at the end of the regex belong to the first alternate quote string and not to the regex. This is concatenating 2 alternate quoted strings which is a tad confusing as it sure looked like part of the regex. If anything you are learning the importance of comments for the poor person behind you! Please comment this accordingly when you are done figuring this out. Even include a link to this post.

How to escape square bracket when using LIKE? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
SQL Server LIKE containing bracket characters
I am having a problem with pattern matching.I have created two objects say,with codes
1)[blah1]
2)[blah2] respectively
in the search tab,suppose if i give "[blah" as the pattern,its returning all the strings
i.e., [blah1] and [blah2]
The query is
select *
from table1
where code like N'%[blah%'
I guess the problem is with the condition and special characters. Please do revert if you have as solution. Is there any solution where we can escape the character"[". I tried to change the condition as N'%[[blah%'.But even then its returning all the objects that is in the table.
When you don't close the square bracket, the result is not specified.
However, the story is different when you close the bracket, i.e.
select *
from table1
where code like N'%[blah%]%'
In this case, it becomes a match for (any) + any of ('b','l','a','h','%') + (any). For SQL Server, you can escape characters using the ESCAPE clause.
select * from table1 where code like N'%\[blah%\]%' escape '\'
SQL Fiddle with examples
You can escape a literal bracket character this way:
select *
from table1
where code like N'%[[]blah%'
Source: LIKE (Transact-SQL), under the section "Using Wildcard Characters As Literals."
I guess this is Microsoft's way of being consistent, since they use brackets to delimit table and column identifiers too. But the use of brackets is not standard SQL. For that matter, bracket as a metacharacter in LIKE patterns is not standard SQL either, so it's not necessary to escape it at all in other brands of database.
As per My understanding, the symbol '[', there is no effect in query. like if you query with symbol and without symbol it shows same result.
Either you can skip the unwanted character at UI Level.
select * from table1 where code like '%[blah%'
select * from table1 where code like '%blah%'
Both shows same result.