REGEX_EXTRACT for specific pattern inside brackets - sql

Trying to use REGEX_EXTRACT in SQL to extract certain string patterns inside Brackets.
So I have tried this formula: REGEX_EXTRACT(column, r'\[(.*?)\]'), but problem is that there are multiple Brackets in the same cell, and this formula will only extract the first string pattern in the first bracket.
So, what I'm trying to figure out is how can I extract specific patterns within the Brackets? The pattern I'm looking for looks like this: [xx-XX]
Where x can be any string in the alphabet.
Any tips or directions would be greatly appreciated

This should work if you always have 2 lowercase letters followed by '-' and then followed by 2 uppercase letters:
\[([a-z]{2}-[A-Z]{2})\]

Related

using Regex get substring between underscore 2 and underscore 3 of string, vb.net

I have a string like: Title Name_2021-04-13_A+B+C_Division.txt. I need to extract the A+B+C. The A+B+C may be other letters. I believe that using Regex would be the simplest way to do this. In other words I need to get the substring between underscore 2 and underscore 3 of string. All of my code is written in vb.net. I have tried:
boatClass = Regex.Match(myFile, "(?<=_)(.*)(?=_)").ToString
I know this is not right but I think it is close. What do I need to add or change?
The regex code that will extract a substring between the second and third underscore of a string is:
(?:[^_]+_){2}([^_]+)
However, I chose to use the split function:
myString.Split("_"c)(2)

Finding strings between dashes using REGEXP_EXTRACT in Bigquery

In Bigquery, I am trying to find a way to extract particular segments of a string based on how many dashes come before it. The number of total dashes in the string will always be the same. For example, I could be looking for the string after the second dash and before the third dash in the following string:
abc-defgh-hij-kl-mnop
Currently, I am using the following regex to extract, which counts the dashes from the back:
([^-]+)(?:-[^-]+){2}$
The problem is that if there is nothing in between the dashes, the regex doesn't work. For example, something like this returns null:
abc-defgh-hij--mnop
Is there a way to use regex to extract a string after a certain number of dashes and cut it off before the subsequent dash?
Thank you!
Below is for BigQuery Standrd SQL
The simplest way in your case is to use SPLIT and OFFSET as in below example
SELECT SPLIT(str, '-')[OFFSET(3)]
above will return empty string for abc-defgh-hij--mnop
to prevent error in case of calling non-existing element - better to use SAFE_OFFSET
SELECT SPLIT(str, '-')[SAFE_OFFSET(3)]

Escape all commas in line except first and last

I have a CSV file which I'm trying to import to a SQL Server table. The file contains lines of 3 columns each, separated by a comma. The only problem is that some of the data in the second column contains an arbitrary number of commas. For example:
1281,I enjoy hunting, fishing, and boating,smith317
I would like to escape all occurrences of commas in each line except the first and the last, such that the result of this line would be:
1281,I enjoy hunting\, fishing\, and boating,smith317
I know I will need some type of regular expression to accomplish this task, but my knowledge of regular expressions is very limited. Currently, I'm trying to use Notepad++ find/replace with regex, but I am open to other ideas.
Any help would be greatly appreciated :-)
Okay, could be a manual stuff. Do this:
Normal find all the , and replace it with \,. Escape everything.
Regex find ^(.*)(\\,) and replace it with $1,.
Regex find (\\,)(.*)$ and replace it with ,$2.
Worked for me in Sublime Text 2.

Using groups in OpenRefine regex

I'm wondering if it is possible to use "groups" in ReGeX used in Open Refine GREL syntax. I mean, I'd like to replace all the dots followed and preceded by a character WITH the same character and dot but followed by a space and then the character.
Something like:
s.replace(/(.{1})\..({1})/,/(1).\s(2)/)
It should, but your last argument needs to be a string, not a regular expression. Internally Refine uses Java's Matcher#replaceAll method which accepts a string argument.
I think I found out how to deal with this. You need to put $X in your string value to address a Xth capture group.
It should be like this:
s.replace(/.?(#capcure group 1).?(#capcure group 2).*?/), " some text $1 some text $2 some text")

How should a string be matched with a regular expression in Objective C

I'm finding it hard to match strings using NSRegularExpression. Generic alpha characters are not a problem with [a-z] but if I need to match a word like 'import' I'm struggling to make it work. I'm sure I have to escape the word in some manner but I can't find any docs around this. A really basic example would be
{{import "hello"}}
where I want to get hold of the string: hello
edit: to clarify - 'hello' could be any string - it's the bit I want returned
This regular expression matches the text between the "-s in your example:
\{\{import "([^"]+)"\}\}
The match will be stored in the first match group.