Massive change polish marks in notepad++ [duplicate] - sql

Consider the following regex:
([a-zA-Z])([a-zA-Z]?)/([a-zA-Z])([a-zA-Z]?)
If the text is: a/b
the capturing groups will be:
/1 'a'
/2 ''
/3 'b'
/4 ''
And if the text is: aa/b
the capturing groups will be:
/1 'a'
/2 'a'
/3 'b'
/4 ''
Suppose, I want to find and replace this string in Notepad++ such that if /2 or /4 are empty (as in the first case above), I prepend c.
So, the text a/b becomes ca/cb.
And the text aa/b becomes aa/cb
I use the following regex for replacing:
(?(2)\1\2|0\1)/(?(4)\3\4|0\3)
But Notepad++ is treating ? literally in this case, and not as a conditional identifier. Any idea what am I doing wrong?

The syntax in the conditional replacement is
(?{GROUP_MATCHED?}REPLACEMENT_IF_YES:REPLACEMENT_IF_NO)
The { and } are necessary to avoid ambiguity when you deal with groups higher than 9 and with named capture groups.
Since Notepad++ uses Boost-Extended Format String Syntax, see this Boost documentation:
The character ? begins a conditional expression, the general form is:
?Ntrue-expression:false-expression
where N is decimal digit.
If sub-expression N was matched, then true-expression is evaluated and sent to output, otherwise false-expression is evaluated and sent to output.
You will normally need to surround a conditional-expression with parenthesis in order to prevent ambiguities.
For example, the format string (?1foo:bar) will replace each match found with foo if the sub-expression $1 was matched, and with bar otherwise.
For sub-expressions with an index greater than 9, or for access to named sub-expressions use:
?{INDEX}true-expression:false-expression
or
?{NAME}true-expression:false-expression
So, use ([a-zA-Z])([a-zA-Z])?/([a-zA-Z])([a-zA-Z])? and replace with (?{2}$1$2:c$1)/(?{4}$3$4:c$3).
The second problem is that you placed the ? quantifier inside the capturing group, making the pattern inside the group optional, but not the whole group. That made the group always "participating in the match", and the condition would be always "true" (always matched). ? should quantify the group.

Related

Teradata regular expressions, 0 or 1 spaces

In Teradata, I'm looking for one regular expression pattern that would allow me to find a pattern of some numbers, then a space or maybe no space, and then 'SF'. It should return 7 in both cases below:
SELECT
REGEXP_INSTR('12345 1000SF', pattern),
REGEXP_INSTR('12345 1000 SF', pattern)
Or, my actual goal is to extract the 1000 in both cases if there's an easier way, probably using REGEXP_SUBSTR. More details are below if you need them.
I have a column that contains free text and I would like to extract the square footage. But, in some cases, there is a space between the number and 'SF' and in some cases there is not:
'other stuff 1000 SF'
'other stuff 1000SF'
I am trying to use the REGEXP_INSTR function to find the starting position. Through google, I have found the pattern for the first to be
'([0-9])+ SF'
When I try the pattern for the second, I try
'([0-9])+SF'
and I get the error
SELECT Failed. [2662] SUBSTR: string subscript out of bounds
I've also found an answer to a similar questions, but they don't work for Teradata. For example, I don't think you can use ? in Teradata.
The error message indicates you're using SUBSTR, not REGEXP_SUBSTR.
Try this:
RegExp_Substr(col, '[0-9]*(?= {0,1}SF)')
Find multiple digits followed by a single optional blank followed by SF and extract those digits.
I would pattern it like this:
\b(\d+)\s*[Ss][Ff]\b
\b # word boundary
(\d+) # 1 or more digits (captured)
\s* # 0 or more white-space characters
[Ss] # character class
[Ff] # character class
\b # word boundary
Demo

regex capture middle of url

I'm trying to figure out the base regex to capture the middle of a google url out of a sql database.
For example, a few links:
https://www.google.com/cars/?year=2016&model=dodge+durango&id=1234
https://www.google.com/cars/?year=2014&model=jeep+cherokee+crossover&id=6789
What would be the regex to capture the text to get dodge+durango , or jeep+cherokee+crossover ? (It's alright that the + still be in there.)
My Attempts:
1)
\b[=.]\W\b\w{5}\b[+.]?\w{7}
, but this clearly does not work as this is a hard coded scenario that would only work like something for the dodge durango example. (would extract "dodge+durango)
2) Using positive lookback ,
[^+]( ?=&id )
but I am not fully sure how to use this, as this only grabs one character behind the & symbol.
How can I extract a string of (potentially) any length with any amount of + delimeters between the "model=" and "&id" boundaries?
seems like you could use regexp_replace and access match groups:
regexp_replace(input, 'model=(.*?)([&\\s]|$)', E'\\1')
from here:
The regexp_replace function provides substitution of new text for
substrings that match POSIX regular expression patterns. It has the
syntax regexp_replace(source, pattern, replacement [, flags ]). The
source string is returned unchanged if there is no match to the
pattern. If there is a match, the source string is returned with the
replacement string substituted for the matching substring. The
replacement string can contain \n, where n is 1 through 9, to indicate
that the source substring matching the n'th parenthesized
subexpression of the pattern should be inserted, and it can contain \&
to indicate that the substring matching the entire pattern should be
inserted. Write \ if you need to put a literal backslash in the
replacement text. The flags parameter is an optional text string
containing zero or more single-letter flags that change the function's
behavior. Flag i specifies case-insensitive matching, while flag g
specifies replacement of each matching substring rather than only the
first one
I may be misunderstanding, but if you want to get the model, just select everything between model= and the ampersand (&).
regexp_matches(input, 'model=([^&]*)')
model=: Match literally
([^&]*): Capture
[^&]*: Anything that isn't an ampersand
*: Unlimited times

Oracle REGEXP_LIKE doesn't work as expected

I was testing a regular expression in Oracle SQL and found something I could not understand:
-- NO MATCH
SELECT 1 FROM DUAL WHERE REGEXP_LIKE ('Professor Frank', '(^|\s)Prof[^\s]*(\s|$)');
Above doesn't match, while the following matches:
-- MATCH
SELECT 1 FROM DUAL WHERE REGEXP_LIKE ('Professor Frank', '(^|\s)Prof\S*(\s|$)');
In other regex flavors, It will be like \bProf[^\s]*\b versus \bProf\S*\b and have similar results. Note: Oracle SQL regex does not have \b or word boundary.
Question: Why don't [^\s]* and \S* work the same way in Oracle SQL?
I notice if I remove the (\s|$) at the end, the first regex will match.
In Oracle regular expressions, \s is indeed the escape sequence for a space, but NOT in a matching character set (that is, [.....], or [^....] for excluding one character). In a matching character set, only two characters have a special meaning, - for ranges and ] for closing the set enumeration. They can't be escaped; if needed in the matching set, ] must always be the first character right after the opening [ (it is the ONLY position in which a closing ] stands for itself as a character, and does not denote the end of the matching set), and - must be first or last (best to leave it always to the end of the matching set) - anywhere else it is seen as a range marker. To include (or exclude, if using the [^.....] syntax) a space, just type an actual physical space in the matching set.
Edit: What I said above is not entirely right. There is another special character in a matching set, namely ^. If it is used in the first position, it means "match any character OTHER THAN." In any other position it stands for itself. For example, '[^^]' will match any single character OTHER THAN ^ (the first ^ has special meaning, the second stands in for itself). And, a closing bracket ] stands for itself if it is the second character in brackets, if the first character is ^ (with its SPECIAL meaning). That is, to match any single character OTHER THAN ], we can use the matching pattern '[^]]'.

Using RegExp in sql to find rows that only contain 'x'

How do i use a regexp to only find rows where the first name only includes one type of character 'x' but it doesnt matter how many characters there are.
So far I came up with:
REGEXP_LIKE(LOWER(fst_name),'^x+$'))
possible rows I am looking for:
'x'
'xx'
'xxx'
'xxxxxxxxx'
So im interpreting this as meaning find the rows where x is at the beginning and the end of the field and there can be only x's inbetween. Am I interpreting this correctly?
or is it possible to have: 'xxxxxxaxxxxx'
Your regex is correct:
^x+$
^ is the "start" anchor
x is the character for which you are searching. I assume it isn't a regex metacharacter
+ is the "one or more" quantifier
$ is the "end" anchor
So I would interpret your regex to match all of the cases you supplied, and would not match something like 'xxxxaxxxx'. http://regex101.com/r/dE8vU6
It's been long enough since I used Oracle that I don't recall whether your REGEX_LIKE syntax is correct there, but it seems right to me.

Regular expression to match specific variations of function

I am trying to construct a regular expression to find the text of the following variations.
NSLocalizedString(#"TEXT")
NSLocalizedStringFromTable(#"TEXT")
NSLocalizedStringWithDefaultValue(#"TEXT")
...
The goal is to extract TEXT. I have been able to construct a regex for each individual function or macro, e.g., (?<=NSLocalizedString)\(#"(.*?)". However, I am looking for a solution that does the job no matter what the name of the function as long as it starts with NSLocalizedString.
I assumed it was as simple as (?<=NSLocalizedString\w+)\(#"(.*?)", but that does't seem to do the trick.
How about this one?
/NSLocalizedString\w*\(#"(.*)"\)/
Explanation:
NSLocalizedString 'NSLocalizedString'
\w+ word characters (a-z, A-Z, 0-9, _) (0 or
more times (matching the most amount
possible))
\( '('
#" '#"'
( group and capture to \1:
.* any character except \n (0 or more times
(matching the most amount possible))
) end of \1
" '"'
\) ')'
The only reason your regex doesn't work is because the regex engine doesn't support variable length lookbehinds. The (?<=NSLocalizedString\w+) is variable length so can't be used.
Firstly it needs to be \w* not \w+, to allow your first example string to match.
If you move the \w* outside the lookbehind (?<=NSLocalizedString)\w* it will work just fine.
Alternatively, since you have to use a capturing group to grab the text value anyway, theres no need for the lookbehind at all. Change the (?<= to a (?: and it becomes a non-capturing group (which can be variable length), and then just grab your text value from group 1.
Your attempt was:
(?<=NSLocalizedString\w+)\(#"(.*?)"
Both of these minor changes should make it work:
(?<=NSLocalizedString)\w*\(#"(.*?)"
(?:NSLocalizedString\w*)\(#"(.*?)"
The following is actually not supported in Objective-C:
The solution that will extract exactly TEXT without using any groups is:
NSLocalizedString\w*\(#"\K[^"]*
It avoids the need to use a negative lookbehind (which can't be used for reasons I explain below) by using the \K modifier, which chops off anything before it from the match.