Allow one random character at any position in string pattern - sql

I want to find a specific pattern in a string with the allowance of one (or a number I can set) tolerance deviation character at any position in the string.
So if I am looking for a date in the Format yyyy-mm-dd. Then I would like to accept for example:
'2020-08x-12'
'x2020-08-01'
'2020-08-x12'
So far I've got the Standard pattern recognition:
SELECT foo
FROM bar
WHERE foo LIKE '[1-2][0-9][0-9][0-9][-][0-1][0-9][-][0-3][0-9]'
Now I would like to allow a random character in between (max 1 character) and still recognize the pattern.

SQL Server is not optimal for this but you can use a massive OR and LIKE:
WHERE foo LIKE '[1-2][0-9][0-9][0-9][-][0-1][0-9][-][0-3][0-9]' OR
foo LIKE '[1-2][0-9][0-9][0-9][-][0-1][0-9][-][0-3][0-9]_' OR
foo LIKE '[1-2][0-9][0-9][0-9][-][0-1][0-9][-][0-3]_[0-9]' OR
foo LIKE '[1-2][0-9][0-9][0-9][-][0-1][0-9][-]_[0-3][0-9]' OR
. . .
foo LIKE '_[1-2][0-9][0-9][0-9][-][0-1][0-9][-][0-3][0-9]'
The _ matches exactly one character. So the idea is to put it in the pattern at every possible position.
Hmmm . . . an alternative method that should work is to match with any "random" characters between the known ones and then check the length:
WHERE foo LIKE '%[1-2]%[0-9]%[0-9]%[0-9]%[-]%[0-1]%[0-9]%[-]%[0-3]%[0-9]%' AND
LEN(foo) IN (10, 11)

Related

Massive change polish marks in notepad++ [duplicate]

Consider the following regex:
([a-zA-Z])([a-zA-Z]?)/([a-zA-Z])([a-zA-Z]?)
If the text is: a/b
the capturing groups will be:
/1 'a'
/2 ''
/3 'b'
/4 ''
And if the text is: aa/b
the capturing groups will be:
/1 'a'
/2 'a'
/3 'b'
/4 ''
Suppose, I want to find and replace this string in Notepad++ such that if /2 or /4 are empty (as in the first case above), I prepend c.
So, the text a/b becomes ca/cb.
And the text aa/b becomes aa/cb
I use the following regex for replacing:
(?(2)\1\2|0\1)/(?(4)\3\4|0\3)
But Notepad++ is treating ? literally in this case, and not as a conditional identifier. Any idea what am I doing wrong?
The syntax in the conditional replacement is
(?{GROUP_MATCHED?}REPLACEMENT_IF_YES:REPLACEMENT_IF_NO)
The { and } are necessary to avoid ambiguity when you deal with groups higher than 9 and with named capture groups.
Since Notepad++ uses Boost-Extended Format String Syntax, see this Boost documentation:
The character ? begins a conditional expression, the general form is:
?Ntrue-expression:false-expression
where N is decimal digit.
If sub-expression N was matched, then true-expression is evaluated and sent to output, otherwise false-expression is evaluated and sent to output.
You will normally need to surround a conditional-expression with parenthesis in order to prevent ambiguities.
For example, the format string (?1foo:bar) will replace each match found with foo if the sub-expression $1 was matched, and with bar otherwise.
For sub-expressions with an index greater than 9, or for access to named sub-expressions use:
?{INDEX}true-expression:false-expression
or
?{NAME}true-expression:false-expression
So, use ([a-zA-Z])([a-zA-Z])?/([a-zA-Z])([a-zA-Z])? and replace with (?{2}$1$2:c$1)/(?{4}$3$4:c$3).
The second problem is that you placed the ? quantifier inside the capturing group, making the pattern inside the group optional, but not the whole group. That made the group always "participating in the match", and the condition would be always "true" (always matched). ? should quantify the group.

Teradata regular expressions, 0 or 1 spaces

In Teradata, I'm looking for one regular expression pattern that would allow me to find a pattern of some numbers, then a space or maybe no space, and then 'SF'. It should return 7 in both cases below:
SELECT
REGEXP_INSTR('12345 1000SF', pattern),
REGEXP_INSTR('12345 1000 SF', pattern)
Or, my actual goal is to extract the 1000 in both cases if there's an easier way, probably using REGEXP_SUBSTR. More details are below if you need them.
I have a column that contains free text and I would like to extract the square footage. But, in some cases, there is a space between the number and 'SF' and in some cases there is not:
'other stuff 1000 SF'
'other stuff 1000SF'
I am trying to use the REGEXP_INSTR function to find the starting position. Through google, I have found the pattern for the first to be
'([0-9])+ SF'
When I try the pattern for the second, I try
'([0-9])+SF'
and I get the error
SELECT Failed. [2662] SUBSTR: string subscript out of bounds
I've also found an answer to a similar questions, but they don't work for Teradata. For example, I don't think you can use ? in Teradata.
The error message indicates you're using SUBSTR, not REGEXP_SUBSTR.
Try this:
RegExp_Substr(col, '[0-9]*(?= {0,1}SF)')
Find multiple digits followed by a single optional blank followed by SF and extract those digits.
I would pattern it like this:
\b(\d+)\s*[Ss][Ff]\b
\b # word boundary
(\d+) # 1 or more digits (captured)
\s* # 0 or more white-space characters
[Ss] # character class
[Ff] # character class
\b # word boundary
Demo

regex capture middle of url

I'm trying to figure out the base regex to capture the middle of a google url out of a sql database.
For example, a few links:
https://www.google.com/cars/?year=2016&model=dodge+durango&id=1234
https://www.google.com/cars/?year=2014&model=jeep+cherokee+crossover&id=6789
What would be the regex to capture the text to get dodge+durango , or jeep+cherokee+crossover ? (It's alright that the + still be in there.)
My Attempts:
1)
\b[=.]\W\b\w{5}\b[+.]?\w{7}
, but this clearly does not work as this is a hard coded scenario that would only work like something for the dodge durango example. (would extract "dodge+durango)
2) Using positive lookback ,
[^+]( ?=&id )
but I am not fully sure how to use this, as this only grabs one character behind the & symbol.
How can I extract a string of (potentially) any length with any amount of + delimeters between the "model=" and "&id" boundaries?
seems like you could use regexp_replace and access match groups:
regexp_replace(input, 'model=(.*?)([&\\s]|$)', E'\\1')
from here:
The regexp_replace function provides substitution of new text for
substrings that match POSIX regular expression patterns. It has the
syntax regexp_replace(source, pattern, replacement [, flags ]). The
source string is returned unchanged if there is no match to the
pattern. If there is a match, the source string is returned with the
replacement string substituted for the matching substring. The
replacement string can contain \n, where n is 1 through 9, to indicate
that the source substring matching the n'th parenthesized
subexpression of the pattern should be inserted, and it can contain \&
to indicate that the substring matching the entire pattern should be
inserted. Write \ if you need to put a literal backslash in the
replacement text. The flags parameter is an optional text string
containing zero or more single-letter flags that change the function's
behavior. Flag i specifies case-insensitive matching, while flag g
specifies replacement of each matching substring rather than only the
first one
I may be misunderstanding, but if you want to get the model, just select everything between model= and the ampersand (&).
regexp_matches(input, 'model=([^&]*)')
model=: Match literally
([^&]*): Capture
[^&]*: Anything that isn't an ampersand
*: Unlimited times

Find character sequence at specific position in string

I need to use SQL to find a sequence of characters at a specific position in a string.
Example:
atcgggatgccatg
I need to find 'atg' starting at character 7 or at character 7-9, either way would work. I don't want to find the 'atg' at the end of the string. I know about LIKE but couldn't find how to use it for a specific position.
Thank you
In MS Access, you could write this as:
where col like '???????atg*' or
col like '????????atg*' or
col like '?????????atg*'
However, if you interested in this type of comparison, you might consider using a database that supports regular expressions.
If you have a look at this page you'll find that LIKE is entirely capable of doing what you want. To find something at, for example, a 3 char offset you can use something like this
SELECT * FROM SomeTable WHERE [InterestingField] LIKE '___FOO%'
The '_' (underscore) is a place marker for any char. Having 3 "any char" markers in the pattern, with a trailing '%', means that the above SQL will match anything with FOO starting from the fourth char, and then anything else (including nothing).
To look for something 7 chars in, use 7 underscores.
Let me know ifthis isn't quite clear.
EDIT: I quoted SQL Server stuff, not Access. Swap in '?' where I have '_', use '*' instead of '%', and check out this link instead.
Revised query:
SELECT * FROM SomeTable WHERE [InterestingField] LIKE '???FOO*'

regex pattern to match string in Objective C

Here is a regex pattern I created in Objective C:
^\n?([#]{1,2}$|[*]{1,2}$|[0-9]{1,3}.$)
I want to match:
starts with \n or empty
ends with # or * or .
if ends with . there will be 1 or 2 or 3 digits in between
If ends with # or *, there could be 1 more # or * in between
The regex I created matches '\n1#' which is not what I want.
Can anyone help me correct this? Is this fastest one? The regex will be used frequently, so I want it to be as fast as possible.
UPDATE:
Here's a sample strings for testing:
"\n#", "11*1", "1#", "a1.", "111*", "\n1#", "\n11.", "a11.", "1. ", "*1."
The 1# and 111* were matched. Not sure what went wrong.
You're matching #1 and 111# because of [0-9]{1,3}.. You haven't escaped the . and this group basically matches any sequence of 1 to 3 digits followed by any character.
What you're looking for is
^\n?(#{1,2}|\*{1,2}|[0-9]{1,3}\.)$
Properly escaped in ObjC, it would be
#"^\n?(#{1,2}|\\*{1,2}|[0-9]{1,3}\\.)$"
If this regex is used quite a lot, you might want to cache the NSRegularExpression object to avoid compiling it everytime.
Regexpal is very useful to test regular expressions.