Remove all text between 2 sentences in regex

Remove all text between 2 sentences in regex - sql

I going crazry with regex.
I need to extract a words between FROM and WHERE in this syntax:
SELECT IDClient, Client FROM Client WHERE IDClient = 1 GROUP BY IDClient, Client ORDER BY IDClient
result = Client
How can I resolve this using regular expressions?

/FROM (.*) WHERE/i

(?<=FROM\s+).*(?=\s+WHERE)
That uses a look behind and a lookahead to get what is between FROM and WHERE, and can be modified depending on whether you want the whitespace or not.

Use a regex cheat sheet, it's not too hard to work out.

You can use this online regular expression builder:
http://gskinner.com/RegExr/
Or try the tutorials at:
regular-expressions dot info

Related

group by part of url using regex splunk

I have multiple url's all start with /api/net, I want to group by next couple of strings that are separated by / like
/api/net/abc/def?key=value
/api/net/c/d?key1=value1
/api/net/j/h?key2=value2
I have below regular expression which parses all url's but I explicitly have to specify required in regular expression .
| rex field=requestPath "(?<volga>.+?(\/abc\/def)|(\/c\/d)|(\/j\/h).+?)"
volga is a named capturing group, I want to do a group by on volga without adding /abc/def, /c/d,/j/h in regular expression so that I would know number of expressions in there instead of hard coding.
There are other expressions I would not know to add, So I want to group by on next 2 words split by / after "net" and do a group by , also ignore rest of the url. Let me know if you did not understand, I could explain more.

If I understand the question correctly, this regex will parse the URL and return the two domains as 'dom1' and 'dom2', respectively. Then you can group/sort on them.
... | rex field=requestPath "\/api\/net\/(?<dom1>[^\/]+)\/(?<dom2>[^\/\?]+)"
| stats values(*) as * by dom1,dom2

Using regexp in Big Query to extract URLs

I've been trying to extract any URL present within my 'Text' column in Big Query. The column contains a mixture of text and URLs dotted throughout (a cell might contain more than one URL) I'm trying to use this regexp:
SELECT
REGEXP_EXTRACT (Text, r'(http(s)?:\/\/.)?(www\.)?[-a-zA-Z0-9:%._\+~#=]{2,256}\.[a-z]{2,6}\b([-a-zA-Z0-9%_:?\+.~#&//=]*')
FROM
Data.Text_Files
I currently get 'failed to parse regular expression' when I try to run the query. I've tried modifying it but to no avail.
The regexp works in an online builder but I'm just not sure how to incorporate it into Big Query.
Any help would be much appreciated - or at least pointers on how to incorporate regular expressions into Big Query!

Try below - it is for BigQuery Standard SQL (see Enabling Standard SQL and Migrating from legacy SQL)
WITH YourTable AS (
SELECT 1 AS id, 'What have you tried so far? Please edit your question to show a [Minimal, Complete, and Verifiable example](http://stackoverflow.com/help/mcve) of the code that you are having problems with, then we can try to help with the specific problem. You can also read [How to Ask](http://stackoverflow.com/help/how-to-ask). ' AS Text UNION ALL
SELECT 2 AS id, 'Important on SO, you can mark accepted answer by using the tick on the left of the posted answer, below the voting. see http://meta.stackexchange.com/questions/5234/how-does-accepting-an-answer-work#5235 for why it is important. There are more ... You can check about what to do when someone answers your question - http://stackoverflow.com/help/someone-answers.' AS Text UNION ALL
SELECT 3 AS id, 'If an answer has helped you solve your problem and you accept it you should also consider voting it up. See more at http://stackoverflow.com/help/someone-answers and Upvote section in http://meta.stackexchange.com/questions/5234/how-does-accepting-an-answer-work#5235' AS Text
)
SELECT
id,
REGEXP_EXTRACT_ALL(Text, r'(?i:(?:(?:(?:ftp|https?):\/\/)(?:www\.)?|www\.)(?:[\da-z-_\.]+)(?:[a-z\.]{2,7})(?:[\/\w\.-_\?\&]*)*\/?)') AS URL
FROM YourTable
This gives you output with id field, and repeated field with all respective URLs
If you need flattened result - you can use below variation
WITH YourTable AS (
SELECT 1 AS id, 'What have you tried so far? Please edit your question to show a [Minimal, Complete, and Verifiable example](http://stackoverflow.com/help/mcve) of the code that you are having problems with, then we can try to help with the specific problem. You can also read [How to Ask](http://stackoverflow.com/help/how-to-ask). ' AS Text UNION ALL
SELECT 2 AS id, 'Important on SO, you can mark accepted answer by using the tick on the left of the posted answer, below the voting. see http://meta.stackexchange.com/questions/5234/how-does-accepting-an-answer-work#5235 for why it is important. There are more ... You can check about what to do when someone answers your question - http://stackoverflow.com/help/someone-answers.' AS Text UNION ALL
SELECT 3 AS id, 'If an answer has helped you solve your problem and you accept it you should also consider voting it up. See more at http://stackoverflow.com/help/someone-answers and Upvote section in http://meta.stackexchange.com/questions/5234/how-does-accepting-an-answer-work#5235' AS Text
)
SELECT
id, URL
FROM (
SELECT id, REGEXP_EXTRACT_ALL(Text, r'(?i:(?:(?:(?:ftp|https?):\/\/)(?:www\.)?|www\.)(?:[\da-z-_\.]+)(?:[a-z\.]{2,7})(?:[\/\w\.-_\?\&]*)*\/?)') AS URL
FROM YourTable
), UNNEST(URL) as URL
Note: you can use here any regexp that you will be able to find on web - but what a must is - there is only one matching group is allowed! so all inner matching group should be escaped with ?: as you can see it in above examples. So the ONLY group that you expect to see in output should be left as is - w/o ?:

Your regex has an incomplete capturing group, and has 2 unescaped characters. I don't know which online regex builder you're using, but maybe you forgot to put your new regex into it?
The problems are as follows:
(http(s)?:\/\/.)?(www\.)?[-a-zA-Z0-9:%._\+~#=]{2,256}\.[a-z]{2,6}\b([-a-zA-Z0-9%_:?\+.~#&//=]*
POINTERS TO PROBLEMS ON THIS LINE ---> ^1 ^^2
This is the start of a capturing group with no end. You probably want the ) right before the *.
All slashes need to be escaped. This should probably be \/ or maybe even \/\\.
Here is an example with both of my suggestions implemented: https://regex101.com/r/pt1hqS/1
Good luck fixing it!

Parse SQL with REGEX to find Physical Update

I've spent a bit of time trying to bend regex to my will but its beaten me.
Here's the problem, for the following text...
--to be matched
UPDATE dbo.table
UPDATE TOP 10 PERCENT dbo.table
--do not match
UPDATE #temp
UPDATE TOP 10 PERCENT #temp
I'd like to match the first two updates statements and not match the last two update statements. So far I have the regex...
UPDATE\s?\s+[^#]
I've been trying to get the regex to ignore the TOP 10 PERCENT part as its just gets in the way. But I haven't been successful.
Thanks in advance.
I'm using .net 3.5

I assume you're trying to parse real SQL syntax (looks like SQL Server) so I've tried something that is more suitable for that (rather than just detecting the presence of #).
You can try regex like:
UPDATE\s+(TOP.*?PERCENT\s+)?(?!(#|TOP.*?PERCENT|\s)).*
It checks for UPDATE followed by optional TOP.*?PERCENT and then by something that is not TOP.*?PERCENT and doesn't start with #. It doesn't check just for the presence of # as this may legitimately appear in other position and not mean a temp table.

As I understand it, you want a regex to interact with SQL code, not actually querying a database?
You can use a negative look ahead to check if the line has #temp:
(?m)^(?!.*#temp).*UPDATE
(?!...) will fail the whole match if what's inside it matches, ^ matches the beginning of the line when combined with the m modifier. (?m) is the inline version of this modifier, as I don't know how/where you plan on using the regex.
See demo here.

#Robin's solution is much better but in case you needed regex with some simplier mechanisms employed I give you this:
UPDATE\s+(TOP\s+10\s+PERCENT\s+)?[a-z\.]+

sqlconsumer, here's a fully functioning C# .NET program. Does it do what you're looking for?
using System;
using System.Text.RegularExpressions;
class Program {
static void Main() {
string s1 = "UPDATE dbo.table";
string s2 = "UPDATE TOP 10 PERCENT dbo.table";
string s3 = "UPDATE #temp";
string s4 = "UPDATE TOP 10 PERCENT #temp";
string pattern = #"UPDATE\s+(?:TOP 10 PERCENT\s+)?dbo\.\w+";
Console.WriteLine(Regex.IsMatch(s1, pattern) );
Console.WriteLine(Regex.IsMatch(s2, pattern));
Console.WriteLine(Regex.IsMatch(s3, pattern));
Console.WriteLine(Regex.IsMatch(s4, pattern));
Console.WriteLine("\nPress Any Key to Exit.");
Console.ReadKey();
} // END Main
} // END Program
The Output:
True
True
False
False

Two IP Address match

I need to match two ipaddress with a regular expression:
Like 20.20.20.20
should match with 20.20.20.20
should match with [http://20.20.20.20/abcd]
should not match with 20.20.20.200
should not match with [http://20.20.20.200/abcd]
should not match with [http://120.20.20.20/abcd]
At present i am using something like this regular expression: ".*[^(\d)]20.20.20.20[^(\d)].*"
But it is not working for the 1st and 3rd case.Please help me with this regular expression.

You're ignoring the case where the line starts with 20.20.20.20:
"(.*[^(\d)]|^)20.20.20.20([^(\d)].*|$)"
seems to work for me

You can do it like this:
select * from tablename
where ip = '20.20.20.20' or ip like 'http://20.20.20.20/%'

[^(\d)] without quantifier means that you expect exactly 1 characer that is not a number
using [^(\d)]* will help

regular expression to pull words beginning with #

Trying to parse an SQL string and pull out the parameters.
Ex: "select * from table where [Year] between #Yr1 and #Yr2"
I want to pull out "#Yr1" and "#Yr2"
I have tried many patterns, but none has worked, such as:
matches = Regex.Matches(sSQL, "\b#\w*\b")
and
matches = Regex.Matches(sSQL, "\b\#\w*\b")
Any help?

You're trying to put a word boundary after the #, rather than before. Maybe this:
\w(#[A-Z0-9a-z]+)
or
\w(#[^\s]+)

I would have gone with
/^|\s(#\w+)\s|$/
or if you didn't want to include the #
/^|\s#(\w+)\s|$/
though I also like joel's above, so maybe one of these
/^|\s(#[^\s]+)\s|$/
/^|\s#([^\s]+)\s|$/

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Remove all text between 2 sentences in regex - sql

I going crazry with regex. I need to extract a words between FROM and WHERE in this syntax: SELECT IDClient, Client FROM Client WHERE IDClient = 1 GROUP BY IDClient, Client ORDER BY IDClient result = Client How can I resolve this using regular expressions?

/FROM (.*) WHERE/i

(?<=FROM\s+).*(?=\s+WHERE) That uses a look behind and a lookahead to get what is between FROM and WHERE, and can be modified depending on whether you want the whitespace or not.

Use a regex cheat sheet, it's not too hard to work out.

You can use this online regular expression builder: http://gskinner.com/RegExr/ Or try the tutorials at: regular-expressions dot info

Related

group by part of url using regex splunk

Using regexp in Big Query to extract URLs

Parse SQL with REGEX to find Physical Update

Two IP Address match

regular expression to pull words beginning with #

Categories

Resources