Concatenate Regular Expressions - sql

I have a situation where a string should match one pattern or the other. I tried several options, but none works. If I use both patterns independently they work, but when I concatenate using Pipe ["|"] operator, outcome is not correct. Any help is much appreciated. Thank you in advance.
Select 'P' from dual Where REGEXP_LIKE('W777AA,WZGET0,WZGEG0','(^W[0-9A-Z]{5}(,W[0-9A-Z]{5}){0,3}$)')
Select 'P' from dual Where REGEXP_LIKE('WZGET%','%$')
Concatenate SQL:
Select 'P' from dual Where REGEXP_LIKE ('W777AA,WZGET0,WZGEG0','(^W[0-9A-Z]{5}(,W[0-9A-Z]{5}){0,3}$ | (%$))')

Smells like an order of operations issue. Try throwing in some parens, something like this:
Concatenate SQL: Select 'P' from dual Where REGEXP_LIKE
('W777AA,WZGET0,WZGEG0','((^W[0-9A-Z]{5}(,W[0-9A-Z]{5}){0,3}$)|(%$))')

Related

ORACLE: How to use regexp_like to find a string with single quotes between two characters?

I need to query the DB for all records that have two single quite between characters. Example : We've, who's.
I have the regex https://regex101.com/r/6MtB9j/1 but it doesn't work with REGEXP_LIKE.
Tried this
SELECT content
FROM MyTable
WHERE REGEXP_LIKE (content, '(?<=[a-zA-Z])''(?=[a-zA-Z])')
Appreciate the help!
Oracle regex does not support lookarounds.
You do not actually need lookaround in this case, you can use
SELECT content
FROM MyTable
WHERE REGEXP_LIKE (content, '[a-zA-Z]''[a-zA-Z]')
This will work since REGEXP_LIKE only attempts one match, and if there is a match, it returns true, otherwise, false (eventually, fetching a record or not).
Lookarounds are useful in case you need to replace or extract values, when matches may overlap.
If you just need a single quote in a string, you can use:
where content like '%''%'
If they specifically need to be letters, then you need a regular expression:
regexp_like(content, '[a-zA-Z][''][a-zA-Z]')
or:
regexp_like(content, '[a-zA-Z]\'[a-zA-Z]')
If I understand well, you may need something like
regexp_count(content, '[a-zA-Z]''[a-zA-Z]') = 2.
For example, this
with myTable(content) as
(
select q'[what's]' from dual union all
select q'[who's, what's]' from dual union all
select q'[who's, what's, I'm]' from dual
)
select *
from myTable
where regexp_count(content, '[a-zA-Z]''[a-zA-Z]') = 2
gives
CONTENT
------------------
who's, what's

How can I exclude all letters without making 26 different statements?

I would like to make an easier statement instead of 26 other "NOT LIKE" statements, anyone have any idea how to do that ? So I can include all letters of the alphabet instead of just the many individual letters. Thank you.
SELECT *
FROM name
WHERE flag LIKE 'Y'
AND name.autotrackchild IS NULL
AND substring(name.lot,LENGTH(name.lot),length(name.lot)) NOT LIKE 'A'
AND substring(name.lot,LENGTH(name.lot),length(name.lot)) NOT LIKE 'B'
AND substring(name.lot,LENGTH(name.lot),length(name.lot)) NOT LIKE 'C'
--REMOVES CHILD LOTS (ANYTHING WITH A LETTER ON THE END OF IT'S LENGTH)
You would use regular expressions:
where flag like 'Y' and
regexp_like(name.lot, '[^A-Z]$')
The following is sufficient for the goal:
and right(name.lot, 1) not between 'A' and 'Z'

Using regexp_like

I have a query where I am using regex_like, and I need more than one parameter, something like this:
WHERE regexp_like (FILENAME,'_G_',) or (FILENAME,'_Z_',) or (FILENAME,'_M_',)
Thanks in advance
You can factorize the regexp as follows:
WHERE regexp_like (FILENAME,'_[GMZ]_',)
[GMZ] represents a custom character class made of characters 'G', 'M' and 'Z'.
You can use the following regexp:
regexp_like (FILENAME,'.{1}[GZM]{1}.{1}')
Here . (dot) represents any character
{1} represents only 1 character is allowed for the preceding pattern.
Cheers!!
If you want to add two or more different parameters, that do not have a lots in common, then you can use | to separate them like this:
select *
from table_name
WHERE regexp_like (FILENAME,'_G_|-kk_|-AH-');
Here is a small DEMO
Do not know what exactly you want when you ask to "order by it" but try this:
select id, filename
from table_name
WHERE regexp_like (FILENAME,'_G_|-kk_|-AH-')
order by filename

How to use dash (-) character in LIKE query

In my database I have to filter records where name ends with -N,
but when I make the WHERE clause like in the following query it returns me no records, because - is a wild card character.
I am using this query in Oracle database:
select * from product where productname like '%-N'
but the database has records that end with this product name
At first I thought that Oracle allows to specify a range [a-z] in the LIKE operator, and that needs to treat - in a special way. So, my suggestion was to escape the dash:
select * from product where productname like '%\-N' ESCAPE '\'
https://docs.oracle.com/cd/B13789_01/server.101/b10759/conditions016.htm
On the other hand, as #Amadan correctly said in the comment, Oracle's LIKE operator only recognises two wildcard characters: _ and %.
It means that escaping the - should not change anything.
Which means that most likely the dash symbol in the query is not the same dash symbol that you have in your table. There are many-many-many different dashes and hyphens in unicode. Here are the most common. Hyphen-Minus (0x002D), En-Dash (0x2013, Alt+0150), Em-Dash (0x2014, Alt+0151).
- – —
'-' is not a wildcard for like (as mentioned elsewhere).
So, start with names that end in 'N':
where productname like '%N'
Does this do what you want?
If not, you can then go to a regular expression. For instance, to find anything other than a digit or letter before the 'N':
where regexp_like(productname, '[^a-zA-Z0-9]N$')
You can refine regexp_like() if this doesn't return what you expect.
Your query should work as expected. Here's an example:
WITH cteData as (SELECT 'ABC-N' AS PRODUCTNAME FROM DUAL UNION ALL
SELECT 'ABCN' AS PRODUCTNAME FROM DUAL UNION ALL
SELECT 'ABC' AS PRODUCTNAME FROM DUAL UNION ALL
SELECT 'DEFGHI-J-K-L-M-N' AS PRODUCTNAME FROM DUAL UNION ALL
SELECT 'DEFGHI-J-K-L-M-' AS PRODUCTNAME FROM DUAL UNION ALL
SELECT 'MY DOG HAS FLEAS' AS PRODUCTNAME FROM DUAL)
SELECT *
FROM cteData
WHERE PRODUCTNAME LIKE '%-N';
As expected, this returns:
ABC-N
DEFGHI-J-K-L-M-N
If you're not getting the results you expected there's something else going on that you haven't showed us.
SQLFiddle here
Best of luck.

How to match and replace sections of a string in SQL

I'm pulling a list of popular sites from my database, but I want to combine results that are from the same domain. I've been able to do this partially by using :
REGEXP_REPLACE(site, '%|^www([123])?\.|^m\.|^mobile\.|^desktop\.')) as site
so that "www.facebook.com" and "facebook.com" or "m.facebook.com"
- all of which appear in the database - are treated as the same when I do a select distinct.
However, I want to take this a step further by writing an expression that looks at each string between periods. If a match is found consecutively in three or more strings between periods, then I want to treat those as the same. I simply can't predict every possible string that could come before "facebook.com", or any other site.
So for example:
"my.careerone.com.au" and
"careerone.com.au" match in three places.
Or "yahoo.realestate.com.au" and "rs.realestate.com.au" match in three places.
Any ideas on how to achieve this?
#David code will work in Vertica as well but not so well performance wise maybe.
You can use Vertica's own internal functions such as TRIM & REGEXP_REPLACE.
After borrowing #David Faber reg exp i endend-up with this.
select TRIM(LEADING '.' from REGEXP_REPLACE(col_name,'^.*((\.[^.]+){3})$', '\1')) AS fixed_dn from table_name;
I don't have Vertica available so I tested this in Oracle SQL (which does have REGEXP_REPLACE() that is similar to Vertica's). Not sure what the CTE syntax would be in Vertica but you'll be querying against a table anyway:
WITH d1 AS (
SELECT 'my.careerone.com.au' AS domain_nm FROM dual
UNION ALL
SELECT 'careerone.com.au' FROM dual
UNION ALL
SELECT 'yahoo.realestate.com.au' FROM dual
UNION ALL
SELECT 'rs.realestate.com.au' FROM dual
)
SELECT domain_nm, TRIM('.' FROM REGEXP_REPLACE(domain_nm, '^.*((\.[^.]+){3})$', '\1')) AS domain_nm_fix
FROM d1;
What REGEXP_REPLACE() does here is trim the highest level subdomains from the domain name, if it exists and if there are more than 3 levels. If there are only three levels then nothing will be replaced as the regex won't match -- that is why the leading . character then has to be trimmed. So, for example, careerone.com.au will be unaltered, while my.careerone.com.au will be changed to .careerone.com.au by the REGEXP_REPLACE(), from which the leading . then has to be trimmed.