How to get the nth match from regexp_matches() as plain text - sql

I have this code:
with demo as (
select 'WWW.HELLO.COM' web
union all
select 'hi.co.uk' web)
select regexp_matches(replace(lower(web),'www.',''),'([^\.]*)') from demo
And the table I get is:
regexp_matches
{hello}
{hi}
What I would like to do is:
with demo as (
select 'WWW.HELLO.COM' web
union all
select 'hi.co.uk' web)
select regexp_matches(replace(lower(web),'www.',''),'([^\.]*)')[1] from demo
Or even the big query version:
with demo as (
select 'WWW.HELLO.COM' web
union all
select 'hi.co.uk' web)
select regexp_matches(replace(lower(web),'www.',''),'([^\.]*)')[offset(1)] from demo
But neither works. Is this possible? If it isn't clear, the result I would like is:
match
hello
hi

Use split_part() instead. Simpler, faster. To get the first word, before the first separator .:
WITH demo(web) AS (
VALUES
('WWW.HELLO.COM')
, ('hi.co.uk')
)
SELECT split_part(replace(lower(web), 'www.', ''), '.', 1)
FROM demo;
db<>fiddle here
See:
Split comma separated column data into additional columns
regexp_matches() returns setof text[], i.e. 0-n rows of text arrays. (Because each regular expression can result in a set of multiple matching strings.)
In Postgres 10 or later, there is also the simpler variant regexp_match() that only returns the first match, i.e. text[]. Either way, the surrounding curly braces in your result are the text representation of the array literal.
You can take the first row and unnest the first element of the array, but since you neither want the set nor the array to begin with, use split_part() instead. Simpler, faster, and less versatile. But good enough for the purpose. And it returns exactly what you want to begin with: text.

I'm a little confused. Doesn't this do what you want?
with demo as (
select 'WWW.HELLO.COM' web
union all
select 'hi.co.uk' web
)
select (regexp_matches(replace(lower(web), 'www.',''), '([^\.]*)'))[1]
from demo
This is basically your query with extra parentheses so it does not generate a syntax error.
Here is a db<>fiddle illustrating that it returns what you want.

Related

ORACLE: How to use regexp_like to find a string with single quotes between two characters?

I need to query the DB for all records that have two single quite between characters. Example : We've, who's.
I have the regex https://regex101.com/r/6MtB9j/1 but it doesn't work with REGEXP_LIKE.
Tried this
SELECT content
FROM MyTable
WHERE REGEXP_LIKE (content, '(?<=[a-zA-Z])''(?=[a-zA-Z])')
Appreciate the help!
Oracle regex does not support lookarounds.
You do not actually need lookaround in this case, you can use
SELECT content
FROM MyTable
WHERE REGEXP_LIKE (content, '[a-zA-Z]''[a-zA-Z]')
This will work since REGEXP_LIKE only attempts one match, and if there is a match, it returns true, otherwise, false (eventually, fetching a record or not).
Lookarounds are useful in case you need to replace or extract values, when matches may overlap.
If you just need a single quote in a string, you can use:
where content like '%''%'
If they specifically need to be letters, then you need a regular expression:
regexp_like(content, '[a-zA-Z][''][a-zA-Z]')
or:
regexp_like(content, '[a-zA-Z]\'[a-zA-Z]')
If I understand well, you may need something like
regexp_count(content, '[a-zA-Z]''[a-zA-Z]') = 2.
For example, this
with myTable(content) as
(
select q'[what's]' from dual union all
select q'[who's, what's]' from dual union all
select q'[who's, what's, I'm]' from dual
)
select *
from myTable
where regexp_count(content, '[a-zA-Z]''[a-zA-Z]') = 2
gives
CONTENT
------------------
who's, what's

Get everything between two strings using regexp_substr

I would like to write a query that gets everything between two strings
So for example getting everything between utm_source and the '&' sign. This is what I have tried:
select regexp_substr(full_utm,'%utm_source%','%&%') from db
However this is invalid syntax
Here is a sample of what I am trying to extract
?utm_source=Facebook&utm_medium=CPC&utm_campaign=April+LAL+-+All+SA+-+CAP+250&utm_content=01noprice
I have also tried this
regexp_substr(full_utm, 'utm_source=(.*)&',1)
but this returns this:
utm_source=Facebook&utm_medium=CPC&utm_campaign=April+LAL+-+All+SA+-+CAP+250&
I've also using split_part:
select split_part(split_part(full_utm,'%utm_source=%',1),'&',1)
The problem is this returns both sources and campaign (e.g utm_campaign=xyz)
You can use regexp_replace() instead:
select regexp_replace(full_utm, '.*utm_source=(.*)&.*', '\1')
from (select '?utm_source=Facebook&utm_medium=CPC&utm_campaign=April+LAL+-+All+SA+-+CAP+250&utm_content=01noprice' as full_utm from dual) x;

Oracle LIKE Not working

Trying to get my grips on Oracle from a SQL environment.
Does anyone know why this query returns 0?
SELECT COUNT( * ) FROM MORGS.LOGS l
WHERE ( l.LOCATION = 'X:\Import\XXX006' ) AND
( l.DIRECTION = 'IN' ) AND
( 'XXX006-Test.txt' LIKE '%XXX006.D$Date,YYYYMMDD$.T$Date,HHNNSS$%' ) -- It fails on this condition
Please take note that 'XXX006-Test.txt' on the left handside of LIKE is the value of the column in the table. I've just hard-coded it here just to demo.
Thanks in advance.
Actually LIKE is working. I'm afraid it's your logic that's faulty. The premise of LIKE is that the whole text in the first parameter exists in its entirety in the second, with wildcards to omit irrelevant characters from the matching.
So this is TRUE ...
where 'ABC' like 'ABC%'
... and this is FALSE ...
where 'ABC' like 'ABCDEF'
Looking at your actual test:
( 'XXX006-Test.txt' LIKE '%XXX006.D$Date,YYYYMMDD$.T$Date,HHNNSS$%' )
we notice that the string XXX006-Test.txt does not exist in XXX006.D$Date,YYYYMMDD$.T$Date,HHNNSS$ so LIKE quite rightly returns FALSE.
" Do you know how I can split the RHS on a '.' and grab only the first index of the split results which is 'XXX006'?"
If the required match is always six characters long the simplest thing is
substr('XXX006-Test.txt', 1, 6)
If the leading thing is variable, you can use regular expressions. To extract everything before the dot:
regexp_replace ( 'XXX006-Test.txt', '(.+)\.txt$','\1' )
Although given the values in the two strings you might want to match on the dash instead ...
regexp_replace ( 'XXX006-Test.txt', '([a-z0-9]+)\-(.*)','\1' )
Depends how stable the pattern is.

SQL: Finding dynamic length characters in a data string

I am not sure how to do this, but I have a string of data. I need to isolate a number out of the string that can vary in length. The original string also varies in length. Let me give you an example. Here is a set of the original data string:
:000000000:370765:P:000001359:::3SA70000SUPPL:3SA70000SUPPL:
:000000000:715186816:P:000001996:::H1009671:H1009671:
For these two examples, I need 3SA70000SUPPL from the first and H1009671 from the second. How would I do this using SQL? I have heard that case statements might work, but I don't see how. Please help.
This works in Oracle 11g:
with tbl as (
select ':000000000:370765:P:000001359:::3SA70000SUPPL:3SA70000SUPPL:' str from dual
union
select ':000000000:715186816:P:000001996:::H1009671:H1009671:' str from dual
)
select REGEXP_SUBSTR(str, '([^:]*)(:|$)', 1, 8, NULL, 1) data
from tbl;
Which can be described as "look at the 8th occurrence of zero or more non-colon characters that are followed by a colon or the end of the line, and return the 1st subgroup (which is the data less the colon or end of the line).
From this post: REGEX to select nth value from a list, allowing for nulls
Sorry, just saw you are using DB2. I don't know if there is an equivalent regular expression function, but maybe it will still help.
For the fun of it: SQL Fiddle
first substring gets the string at ::: and second substring retrieves the string starting from ::: to :
declare #x varchar(1024)=':000000000:715186816:P:000001996:::H1009671:H1009671:'
declare #temp varchar(1024)= SUBSTRING(#x,patindex('%:::%', #x)+3, len(#x))
SELECT SUBSTRING( #temp, 0,CHARINDEX(':', #temp, 0))

Finding first and second word in a string in SQL Developer

How can I find the first word and second word in a string separated by unknown number of spaces in SQL Developer? I need to run a query to get the expected result.
String:
Hello Monkey this is me
Different sentences have different number of spaces between the first and second word and I need a generic query to get the result.
Expected Result:
Hello
Monkey
I have managed to find the first word using substr and instr. However, I do not know how to find the second word due to the unknown number of spaces between the first and second word.
select substr((select ltrim(sentence) from table1),1,
(select (instr((select ltrim(sentence) from table1),' ',1,1)-1)
from table1))
from table1
Since you seem to want them as separate result rows, you could use a simple common table expression to duplicate the rows, once with the full row, then with the first word removed. Then all you have to do is get the first word from each;
WITH cte AS (
SELECT value FROM table1
UNION ALL
SELECT SUBSTR(TRIM(value), INSTR(TRIM(value), ' ')) FROM table1
)
SELECT SUBSTR(TRIM(value), 1, INSTR(TRIM(value), ' ') -1) word
FROM cte
Note that this very simple example assumes that there is a second word, if there isn't, NULL will be returned for both words.
An SQLfiddle to test with.
While Joachim Isaksson's answer is a robust and fast approach, you can also consider splitting the string and selecting from the resulting pieces set. This is just meant as hint for another approach, if your requirements alter (e.g. more than two string pieces).
You could split finally by the regex /[ ]+/, and so getting the words between the blanks.
Find more about splitting here: How do I split a string so I can access item x?
This will strongly depend on the SQL dialect you are using.
Try this with REGEXP_SUBSTR:
SELECT
REGEXP_SUBSTR(sentence,'\w+\s+'),
REGEXP_SUBSTR(sentence,'\s+(\w+)\s'),
REGEXP_SUBSTR(sentence,'\s+(\w+)\s+(\w+)'),
REGEXP_SUBSTR(REGEXP_SUBSTR(sentence,'\s+(\w+)\s+(\w+)'),'\w+$'),
REGEXP_SUBSTR(sentence,'\s+(\w+)\s+$')
FROM table1;
result:
1 2 3 4 5
Hello Monkey Monkey this this is_me
Learn more about REGEXP_SUBSTR reference to Using Regular Expressions With Oracle Database
Test use SqlFiddle: http://sqlfiddle.com/#!4/8e9ef/9
If you only want to get the first and the second word, use REGEXP_INSTR to get second word start position :
SELECT
REGEXP_SUBSTR(sentence,'\w+\s+') AS FIRST,
REGEXP_SUBSTR(sentence,'\w+\s',REGEXP_INSTR(sentence,'\w+\s+')+length(REGEXP_SUBSTR(sentence,'\w+\s+'))) AS SECOND
FROM table1;