Extract extension from email address in SQL - sql

I wanted to extract the extension from email address.
Input: test#work.com
Output: com
Input: test#work.test.com
Output: test.com
I tried,
(REVERSE(LEFT(REVERSE('test#work.test.com'), CHARINDEX('.', REVERSE('test#work.test.com')) - 1)))
This works only the first input. Any help?

It seems you want to remove any characters prior to and including the first period (.) after the at symbol (#). I would use CHARINDEX and STUFF for this:
SELECT STUFF(V.Email,1,CHARINDEX('.',V.Email,CHARINDEX('#',V.Email)),'')
FROM (VALUES('test#work.com'),
('test#work.test.com'))V(Email);

Related

Alternative for Positive Lookahead on Big Query - Match everything before the last delimiter

I'm currently cleaning up URLs and I want to get everything before the last slash ("/")
This is an example string:
https://www.businessinsider.de/gruenderszene/plus-angebot/?tpcc=onsite_gs_header_nav&verification_code=DOVCGF75J8LSID
and the part I want to extract is: https://www.businessinsider.de/gruenderszene/plus-angebot
With normal RegEx, it is super simple with .*(?=\/)
You can see it here on regex101.com
Can you help me to replicate this on BigQuery please, as they don't allow for lookahead/lookbehind?
I might phrase this as a regex replacement which removes the last path separator and path:
SELECT url, REGEXP_REPLACE(url, r'/[^/]+$', '') AS url_out
FROM yourTable;
If you want to specifically target a final path separator immediately followed by a query parameter, then use:
SELECT url, REGEXP_REPLACE(url, r'/\?[^/]+$', '') AS url_out
FROM yourTable;

Regexp_extract everything after appearance of '-q_'

Have strings containing 'q_' which I want to extract everything that comes after it. Some rows contain occurrence of q_ which I want everything that occurs after it. Example values in the column are:
prod-q_cat_trait_cat_social_issue
_prod-q_body_modification_graffiti
event_tickets
dappled_grey
_prod-q_cat_tech_support
What is wrong with my regular expression as I'm trying to remove the trailing '_' after q.
REGEXP_EXTRACT(queue_id, '[^q_]+$')
Is just returning
issue
I've also tried the split method:
SPLIT(queue_id, 'q_')[OFFSET(2)]
But this returns
Array index 2 is out of bounds (overflow)
Any suggestions. Thanks! (I am using Google Cloud SQL)
Using a capturing group, you may extract all after the first q_ with:
REGEXP_EXTRACT(queue_id, 'q_(.*)')
You may extract all after the last q_ with:
REGEXP_EXTRACT(queue_id, '.*q_(.*)')
See the regex demo #1 and regex demo #2.
Here, q_ finds the first occurrence of q_ and (.*) grabs the rest of the line into Group 1, and this is the value returned by REGEXP_EXTRACT. .* matches any 0+ chars other than line break chars as many as possible, that is why the second regex will start capturing the rest of the line after the last occurrence of q_.
Google Cloud SQL uses MySQL. I think the simplest method is substring_index():
select substring_index(queue_id, '-q_', -1)
Can you try this : q_([^q_]+)$? You'll have what you want in the first group.
Edit: this one match all the cases > (?(?<=-q_).*|^((?!-q_).)*$)

how to get specific part from string in sql

I want to retrieve file names from urls in sql.
for example:
Input:
url:
https://www.google.co.in/root/subdir/file.extension?p1=v1&p2=v2
https://www.abxdhcak.com/sitemap-companies.xml
then Output should be:
file.extension
sitemap-companies.xml
To match your expected output you can use REGEXP_REPLACE
REGEXP_REPLACE(txt, '^.*/|\?.*$') as rg
This does 2 things:
'^.*/'
This removes all characters up to and including the last forward-slash in the string.
'\?.*$'
This removes all characters after and including a question mark.
This may not work for all cases, but it works for the examples provided.

SQL Regex - Select everything after '/' and split into array

I have to write a HSQLDB query that splits this string on '/'
/2225/golf drive/#305/Huntsville/AL/1243
This is where I am at
select REGEXP_SUBSTRING_ARRAY(Terms, ''/[a-zA-Z0-9]*'') as ARR from Address
This is giving me
/2225, /golf, /, /Huntsville, /AL, /1243 - (Missing "#305" and "drive" in second split)
How can I modify the regex such that it includes everything after "/" and give me this result
/2225, /golf drive, /#305, /Huntsville, /AL, /1243
In this case why can't you use /[a-zA-Z0-9, #]* regexp? It seems good for your goal.
I've checked, it works here for me: https://regex101.com/r/8bJQEk/1
PS This regexp /\/([^\/]*)/g can helps to split everything. Be careful with slashes). Example

How to check that whole string matching to pattern instead find substrings that matching using NSRegularExpression? [duplicate]

I would like to write a regular expression that starts with the string "wp" and ends with the string "php" to locate a file in a directory. How do I do it?
Example file: wp-comments-post.php
This should do it for you ^wp.*php$
Matches
wp-comments-post.php
wp.something.php
wp.php
Doesn't match
something-wp.php
wp.php.txt
^wp.*\.php$ Should do the trick.
The .* means "any character, repeated 0 or more times". The next . is escaped because it's a special character, and you want a literal period (".php"). Don't forget that if you're typing this in as a literal string in something like C#, Java, etc., you need to escape the backslash because it's a special character in many literal strings.
Example:
ajshdjashdjashdlasdlhdlSTARTasdasdsdaasdENDaknsdklansdlknaldknaaklsdn
1) START\w*END
return: STARTasdasdsdaasdEND - will give you words between START and END
2) START\d*END
return: START12121212END - will give you numbers between START and END
3) START\d*_\d*END
return: START1212_1212END - will give you numbers between START and END having _