Get everything between two strings using regexp_substr - sql

I would like to write a query that gets everything between two strings
So for example getting everything between utm_source and the '&' sign. This is what I have tried:
select regexp_substr(full_utm,'%utm_source%','%&%') from db
However this is invalid syntax
Here is a sample of what I am trying to extract
?utm_source=Facebook&utm_medium=CPC&utm_campaign=April+LAL+-+All+SA+-+CAP+250&utm_content=01noprice
I have also tried this
regexp_substr(full_utm, 'utm_source=(.*)&',1)
but this returns this:
utm_source=Facebook&utm_medium=CPC&utm_campaign=April+LAL+-+All+SA+-+CAP+250&
I've also using split_part:
select split_part(split_part(full_utm,'%utm_source=%',1),'&',1)
The problem is this returns both sources and campaign (e.g utm_campaign=xyz)

You can use regexp_replace() instead:
select regexp_replace(full_utm, '.*utm_source=(.*)&.*', '\1')
from (select '?utm_source=Facebook&utm_medium=CPC&utm_campaign=April+LAL+-+All+SA+-+CAP+250&utm_content=01noprice' as full_utm from dual) x;

Related

BigQuery - Using regexp with LIKE operator (?)

I'd like to get productids from url and I've almost finetuned a query to do it but still there is an issue I cannot solve.
The url usually looks like this:
/xp-pen/toll-spe43-deco-pro-small-medium-spe43-tobuy-p665088831/
or
/harry-potter-es-a-tuz-serlege-2019-m19247107/
As you can see there are two types of ids:
in general, ids start with '-p'
ids of some special products start with '-m'
I created this case when statement:
CASE
WHEN MAX(hits.page.pagePath) LIKE '%-p%'
THEN MAX(REGEXP_REPLACE(REGEXP_EXTRACT(
hits.page.pagePath, '-p[0-9]+/'), '\\-|p|/', ''))
WHEN MAX(hits.page.pagePath) LIKE '%-m%'
THEN MAX(REGEXP_REPLACE(REGEXP_EXTRACT(
hits.page.pagePath, '-m[0-9]+/'), '\\-|m|/', ''))
ELSE NULL
END AS productId
It's a little complicated at the first look but I really needed a regexp_replace and a regexp_extract because '-p' or '-m' characters doesn't appear only before the id but it can be multiplied times in a url.
The problem with my code is that there are some special cases when the url looks like this:
/elveszett-profeciak-2019-m17855487/
As you can see the id starts with '-m' but the url also contains '-p'. In this case the result is empty value in the query.
I think it could be solved by modifying the like operator in the when part of the case when statement: LIKE '%-p%' or LIKE '%-m%'
It would be great to have a regexp expression after or instead of the LIKE operator. Something similar to the parameter of '-p[0-9]+/' what I used in regexp_extract function.
So what I would need is to define in the when part of the statement that if the '-p' or '-m' text is followed by numbers in the urls
I'm not sure it's possible to do or not in BQ.
So what I would need is to define in the when part of the statement that if the '-p' or '-m' text is followed by numbers in the urls
I think you want '-p' and '-m' followed by digits. If so, I think this does what you want:
select regexp_extract(url, '-[pm][0-9]+')
from (select '/xp-pen/toll-spe43-deco-pro-small-medium-spe43-tobuy-p665088831/' as url union all
select '/elveszett-profeciak-2019-m17855487/' union all
select '/harry-potter-es-a-tuz-serlege-2019-m19247107/'
) x

How to get the nth match from regexp_matches() as plain text

I have this code:
with demo as (
select 'WWW.HELLO.COM' web
union all
select 'hi.co.uk' web)
select regexp_matches(replace(lower(web),'www.',''),'([^\.]*)') from demo
And the table I get is:
regexp_matches
{hello}
{hi}
What I would like to do is:
with demo as (
select 'WWW.HELLO.COM' web
union all
select 'hi.co.uk' web)
select regexp_matches(replace(lower(web),'www.',''),'([^\.]*)')[1] from demo
Or even the big query version:
with demo as (
select 'WWW.HELLO.COM' web
union all
select 'hi.co.uk' web)
select regexp_matches(replace(lower(web),'www.',''),'([^\.]*)')[offset(1)] from demo
But neither works. Is this possible? If it isn't clear, the result I would like is:
match
hello
hi
Use split_part() instead. Simpler, faster. To get the first word, before the first separator .:
WITH demo(web) AS (
VALUES
('WWW.HELLO.COM')
, ('hi.co.uk')
)
SELECT split_part(replace(lower(web), 'www.', ''), '.', 1)
FROM demo;
db<>fiddle here
See:
Split comma separated column data into additional columns
regexp_matches() returns setof text[], i.e. 0-n rows of text arrays. (Because each regular expression can result in a set of multiple matching strings.)
In Postgres 10 or later, there is also the simpler variant regexp_match() that only returns the first match, i.e. text[]. Either way, the surrounding curly braces in your result are the text representation of the array literal.
You can take the first row and unnest the first element of the array, but since you neither want the set nor the array to begin with, use split_part() instead. Simpler, faster, and less versatile. But good enough for the purpose. And it returns exactly what you want to begin with: text.
I'm a little confused. Doesn't this do what you want?
with demo as (
select 'WWW.HELLO.COM' web
union all
select 'hi.co.uk' web
)
select (regexp_matches(replace(lower(web), 'www.',''), '([^\.]*)'))[1]
from demo
This is basically your query with extra parentheses so it does not generate a syntax error.
Here is a db<>fiddle illustrating that it returns what you want.

ORACLE: How to use regexp_like to find a string with single quotes between two characters?

I need to query the DB for all records that have two single quite between characters. Example : We've, who's.
I have the regex https://regex101.com/r/6MtB9j/1 but it doesn't work with REGEXP_LIKE.
Tried this
SELECT content
FROM MyTable
WHERE REGEXP_LIKE (content, '(?<=[a-zA-Z])''(?=[a-zA-Z])')
Appreciate the help!
Oracle regex does not support lookarounds.
You do not actually need lookaround in this case, you can use
SELECT content
FROM MyTable
WHERE REGEXP_LIKE (content, '[a-zA-Z]''[a-zA-Z]')
This will work since REGEXP_LIKE only attempts one match, and if there is a match, it returns true, otherwise, false (eventually, fetching a record or not).
Lookarounds are useful in case you need to replace or extract values, when matches may overlap.
If you just need a single quote in a string, you can use:
where content like '%''%'
If they specifically need to be letters, then you need a regular expression:
regexp_like(content, '[a-zA-Z][''][a-zA-Z]')
or:
regexp_like(content, '[a-zA-Z]\'[a-zA-Z]')
If I understand well, you may need something like
regexp_count(content, '[a-zA-Z]''[a-zA-Z]') = 2.
For example, this
with myTable(content) as
(
select q'[what's]' from dual union all
select q'[who's, what's]' from dual union all
select q'[who's, what's, I'm]' from dual
)
select *
from myTable
where regexp_count(content, '[a-zA-Z]''[a-zA-Z]') = 2
gives
CONTENT
------------------
who's, what's

Regular expression for gettin data after - in sql

I have a column with assignment numbers like - 11827,27266,91717,09818-2,726252-3,8716151-0,827272,18181
Now i am selecting the records like
select assignment_number from table;
But now i want that the column detail is retreived in such a way that numbers are only retrieved without -2 -3 etc like
726252-3---> 726252 8716151-0-->8716151
I know i can use regex for this but i do not know how to use it
This will select everthing before the character -:
^([^-]+)
From 726252-3 will match 726252
You would use regexp() substr:
select regexp_substr(assignmentnumber, '[0-9]+')
This will return the first string of numbers encountered in the string.

Select query that displays Joined words separately, not using a function

I require a select query that adds a space to the data based on the placement of the capital letters i.e. 'HelpMe' using this query would be displayed as 'Help Me' . Note i cannot use a stored function to do this the it must be done in the query itself. The Data is of variable length and query must be in SQL. Any Help will be appreciated.
Thanks
You need to use user defined function for this until MS give us support for regular expressions. Solution would be something like:
SELECT col1, dbo.RegExReplace(col1, '([A-Z])',' \1') FROM Table
Aldo this would produce leading space that you can remove with TRIM.
Replace regular expresion function:
http://connect.microsoft.com/SQLServer/feedback/details/378520
About dbo.RegexReplace you can read at:
TSQL Replace all non a-z/A-Z characters with an empty string
Assume if you are using Oracle RDBMS, you use the following,
REGEX_REPLACE
SELECT REGEXP_REPLACE('ILikeToWatchCSIMiami',
'([A-Z.])', ' \1')
AS RX_REPLACE
FROM dual
;
Managed to get this output: * SQLFIDDLE
But as you see it doesn't treat well on words such as CSI though.