What's a regex that will match lines whose previous line starts with a set of characters?
I'm trying to parse M3U files, and I need to match the lines whose preceding line starts with #EXTINF: So if we take this example:
#EXTM3U
#EXT-X-VERSION:3
#EXT-X-TARGETDURATION:10
#EXTINF:11.54
ASMIK_tid_0000250058_m.600000-00000.ts
#EXTINF:8.51
ASMIK_tid_0000250058_m.600000-00001.ts
#EXTINF:11.76
ASMIK_tid_0000250058_m.600000-00002.ts
#EXTINF:10.05
ASMIK_tid_0000250058_m.600000-00003.ts
etc...
I only want to extract these lines:
ASMIK_tid_0000250058_m.600000-00000.ts
ASMIK_tid_0000250058_m.600000-00001.ts
ASMIK_tid_0000250058_m.600000-00002.ts
ASMIK_tid_0000250058_m.600000-00003.ts
I've tried variations on this answer and this: (?#EXT.*\n) but had no luck...
Firstly you have to be sure that the function you are using is matching the whole file instead of line by line, otherwise this is impossible.
Then you would need to specify a lookbehind:
(?<=#EXTINF.*\r\n).*
If your regex implementation does not support lookbehinds OR repetition inside of a lookbehind, you can use two capture groups instead:
(#EXTINF.*\r\n)(.*)
Obviously you would simply ignore the first capture group, but keep all of the data in the second capture group.
If you need to manually specify that the . does not match newlines, you can specify the mode at the beginning of the regex: (?-s)
Related
I need to retrieve the bolded section of the below string . This value is in a column within my Postgres database table.
SEALS_LME_TRADES_MBL_20220919_00212.csv
I tried to utilize the functions; substring, reverse, strpos but they all have limitations. It seems like regex is the best option, however I was not able to do it.
Essentially I need to substring from beginning till the second last '_'. I do not want the date and sequence number along with the file extension at the end.
The closes regex I managed to get is: ^(([^]*){4})
https://regex101.com/
This look a little wonky but how about this?
select substring ('SEALS_LME_TRADES_MBL_20220919_00212.csv', '^(.+)_[^_]+_[^_]+')
Translation
^ from the beginning
(.+) any characters (capture and return this value), followed by
_ an underscore, followed by
[^_]+ one or more non-underscores, followed by
_ an underscore, followed by
[^_]+ one or more non-underscores
Regex greediness will cause any incidental underscores to be captured in the initial string.
Technically speaking the last portion (one or more non-underscores) can probably be omitted.
I would like to have the substring after the last occurrence of a certin character.
Now I found here how to get the first, second or so parts, but I need only the last part.
The input data is a list of file directories:
c:\dir\subdir\subdir\file.txt
c:\dir\subdir\subdir\file2.dat
c:\dir\subdir\file3.png
c:\dir\subdir\subdir\subdir\file4.txt
Unfortunately this is the data I have to work it, otherwise I could list it using command prompt.
The problem is that the number of the directories are always changing.
My code based on the previous link is:
select (regexp_split_to_array(BTRIM(path),'\\'))[1] from myschema.mytable
So far I've tried some things in the brackets that came in to my mind. For example [end], [-1] etc.
Non of them are working. Is there a way to get the last part without rearranging my strings backwards, and getting the first part, then turning it back?
You can use regexp_matches():
select (regexp_matches(path, '[^\\]+$'))[1]
Here is a db<>fiddle.
Have strings containing 'q_' which I want to extract everything that comes after it. Some rows contain occurrence of q_ which I want everything that occurs after it. Example values in the column are:
prod-q_cat_trait_cat_social_issue
_prod-q_body_modification_graffiti
event_tickets
dappled_grey
_prod-q_cat_tech_support
What is wrong with my regular expression as I'm trying to remove the trailing '_' after q.
REGEXP_EXTRACT(queue_id, '[^q_]+$')
Is just returning
issue
I've also tried the split method:
SPLIT(queue_id, 'q_')[OFFSET(2)]
But this returns
Array index 2 is out of bounds (overflow)
Any suggestions. Thanks! (I am using Google Cloud SQL)
Using a capturing group, you may extract all after the first q_ with:
REGEXP_EXTRACT(queue_id, 'q_(.*)')
You may extract all after the last q_ with:
REGEXP_EXTRACT(queue_id, '.*q_(.*)')
See the regex demo #1 and regex demo #2.
Here, q_ finds the first occurrence of q_ and (.*) grabs the rest of the line into Group 1, and this is the value returned by REGEXP_EXTRACT. .* matches any 0+ chars other than line break chars as many as possible, that is why the second regex will start capturing the rest of the line after the last occurrence of q_.
Google Cloud SQL uses MySQL. I think the simplest method is substring_index():
select substring_index(queue_id, '-q_', -1)
Can you try this : q_([^q_]+)$? You'll have what you want in the first group.
Edit: this one match all the cases > (?(?<=-q_).*|^((?!-q_).)*$)
Suppose we want to keep the entire line of a string only if a particular word say e.g 'test' appears at starting of line.
If it appears anywhere then the entire line should be removed
e.g
if function_test()=5; //here this entire line should be removed
test sample =5; //here this entire should be there
From Oracle 10g R2 on you should be able to use the anchor \A to require the match at the beginning of the string (will only work for single-line strings thus).
http://www.regular-expressions.info/oracle.html
What do you mean by keep / remove lines? Where is this regex supposed to run? I.e. is it a part of an SQL command, or part of a grep, or sg else?
Regarding SQL you can use LIKE operator:
WHERE line LIKE 'test%'
You can use substring too:
WHERE substring(line, 1, 4) = 'test'
Using grep or any other language, you can specify start of line, e.g.:
grep '^test' bigfile.txt
Try...
...
WHEN REGEXP_LIKE(string,'^test','i') THEN
//this is a good line, do what you want or return string;
END
...
I'm trying to figure out the base regex to capture the middle of a google url out of a sql database.
For example, a few links:
https://www.google.com/cars/?year=2016&model=dodge+durango&id=1234
https://www.google.com/cars/?year=2014&model=jeep+cherokee+crossover&id=6789
What would be the regex to capture the text to get dodge+durango , or jeep+cherokee+crossover ? (It's alright that the + still be in there.)
My Attempts:
1)
\b[=.]\W\b\w{5}\b[+.]?\w{7}
, but this clearly does not work as this is a hard coded scenario that would only work like something for the dodge durango example. (would extract "dodge+durango)
2) Using positive lookback ,
[^+]( ?=&id )
but I am not fully sure how to use this, as this only grabs one character behind the & symbol.
How can I extract a string of (potentially) any length with any amount of + delimeters between the "model=" and "&id" boundaries?
seems like you could use regexp_replace and access match groups:
regexp_replace(input, 'model=(.*?)([&\\s]|$)', E'\\1')
from here:
The regexp_replace function provides substitution of new text for
substrings that match POSIX regular expression patterns. It has the
syntax regexp_replace(source, pattern, replacement [, flags ]). The
source string is returned unchanged if there is no match to the
pattern. If there is a match, the source string is returned with the
replacement string substituted for the matching substring. The
replacement string can contain \n, where n is 1 through 9, to indicate
that the source substring matching the n'th parenthesized
subexpression of the pattern should be inserted, and it can contain \&
to indicate that the substring matching the entire pattern should be
inserted. Write \ if you need to put a literal backslash in the
replacement text. The flags parameter is an optional text string
containing zero or more single-letter flags that change the function's
behavior. Flag i specifies case-insensitive matching, while flag g
specifies replacement of each matching substring rather than only the
first one
I may be misunderstanding, but if you want to get the model, just select everything between model= and the ampersand (&).
regexp_matches(input, 'model=([^&]*)')
model=: Match literally
([^&]*): Capture
[^&]*: Anything that isn't an ampersand
*: Unlimited times