Regex for letters, digits, no spaces - objective-c

I'm trying to create a Regex to check for 6-12 characters, one being a digit, the rest being any characters, no spaces. Can Regex do this? I'm trying to do this in objective-c and I'm not familiar with Regex at all. I've been reading a couple tutorials, but most are for matching simple cases of a number, or a set of numbers, but not exactly what i'm looking for. I can do it with methods, but I was wondering if it that would be too slow and I figured I could try learning something new.
asdfg1 == ok
asdfg 1 != ok
asdfgh != ok
123456 != ok
asdfasgdasgdasdfasdf != ok

use this regex ^(?=.*\d)(?=.*[a-zA-Z])[^ ]{6,12}$

It seems that you mean "letter" when you say "character", right? And (thanks to burning_LEGION for pointing that out) there may be only one digit?
In that case, use
^(?=\D*\d\D*$)[^\W_]{6,12}$
Explanation:
^ # Start of string
(?=\D*\d\D*$) # Assert that there is exactly one digit in the string
[^\W_] # Match a letter or digit (explanation below)
{6,12} # 6-12 times
$ # End of string
[^\W_] might look a little odd. How does it work? Well, \w matches any letter, digit or underscore. \W matches anything that \w doesn't match. So [^\W] (meaning "match any character that is not not alphanumeric/underscore") is essentially the same as \w, but by adding _ to this character class, we can remove the underscore from the list of allowed characters.

i didn't try though, but i think here is the answer
(^[^\d\x20]*\d[^\d\x20]*$){6,12}

This is for one digit: ^[^\d\x20]{0,11}\d{1}[^\d\x20]{0,11}$ but I can`t get limited to 6-12 length, you can use other function to check length first and if it from 6 to 12 check with this regex witch I wrote.

Related

Trino regexp_replace this character in the beginning but not in the middle Trino [duplicate]

I am a complete Reg-exp noob, so please bear with me. Tried to google this, but haven't found it yet.
What would be an appropriate way of writing a Regular expression matching files starting with a dot, such as .buildpath or .htaccess?
Thanks a lot!
In most regex languages, ^\. or ^[.] will match a leading dot.
The ^ matches the beginning of a string in most languages. This will match a leading .. You need to add your filename expression to it.
^\.
Likewise, $ will match the end of a string.
You may need to substitute the \ for the respective language escape character. However, under Powershell the Regex I use is: ^(\.)+\/
Test case:
"../NameOfFile.txt" -match '^(\\.)+\\\/'
works, while
"_./NameOfFile.txt" -match '^(\\.)+\\\/'
does not.
Naturally, you may ask, well what is happening here?
The (\\.) searches for the literal . followed by a +, which matches the previous character at least once or more times.
Finally, the \\\/ ensures that it conforms to a Window file path.
It depends a bit on the regular expression library you use, but you can do something like this:
^\.\w+
The ^ anchors the match to the beginning of the string, the \. matches a literal period (since an unescaped . in a regular expression typically matches any character), and \w+ matches 1 or more "word" characters (alphanumeric plus _).
See the perlre documentation for more info on Perl-style regular expressions and their syntax.
It depends on what characters are legal in a filename, which depends on the OS and filesystem.
For example, in Windows that would be:
^\.[^<>:"/\\\|\?\*\x00-\x1f]+$
The above expression means:
Match a string starting with the literal character .
Followed by at least one character which is not one of (whole class of invalid chars follows)
I used this as reference regarding which chars are disallowed in filenames.
To match the string starting with dot in java you will have to write a simple expression
^\\..*
^ means regular expression is to be matched from start of string
\. means it will start with string literal "."
.* means dot will be followed by 0 or more characters

Regex like telephone number on Hive without prefix (+01)

We have a problem with a regular expression on hive.
We need to exclude the numbers with +37 or 0037 at the beginning of the record (it could be a false result on the regex like) and without letters or space.
We're trying with this one:
regexp_like(tel_number,'^\+37|^0037+[a-zA-ZÀÈÌÒÙ ]')
but it doesn't work.
Edit: we want it to come out from the select as true (correct number) or false.
To exclude numbers which start with +01 0r +001 or +0001 and having only digits without spaces or letters:
... WHERE tel_number NOT rlike '^\\+0{1,3}1\\d+$'
Special characters like + and character classes like \d in Hive should be escaped using double-slash: \\+ and \\d.
The general question is, if you want to describe a malformed telephone number in your regex and exclude everything that matches the pattern or if you want to describe a well-formed telephone number and include everything that matches the pattern.
Which way to go, depends on your scenario. From what I understand of your requirements, adding "not starting with 0037 or +37" as a condition to a well-formed telephone number could be a good approach.
The pattern would be like this:
Your number can start with either + or 00: ^(\+|00)
It cannot be followed by a 37 which in regex can be expressed by the following set of alternatives:
a. It is followed first by a 3 then by anything but 7: 3[0-689]
b. It is followed first by anything but 3 then by any number: [0-24-9]\d
After that there is a sequence of numbers of undefined length (at least one) until the end of the string: \d+$
Putting everything together:
^(\+|00)(3[0-689]|[0-24-9]\d)\d+$
You can play with this regex here and see if this fits your needs: https://regex101.com/r/KK5rjE/3
Note: as leftjoin has pointed out: To use this regex in hive you might need to additionally escape the backslashes \ in the pattern.
You can use
regexp_like(tel_number,'^(?!\\+37|0037)\\+?\\d+$')
See the regex demo. Details:
^ - start of string
(?!\+37|0037) - a negative lookahead that fails the match if there is +37 or 0037 immediately to the right of the current location
\+? - an optional + sign
\d+ - one or more digits
$ - end of string.

regex not working correctly when the test is fine

For my database, I have a list of company numbers where some of them start with two letters. I have created a regex which should eliminate these from a query and according to my tests, it should. But when executed, the result still contains the numbers with letters.
Here is my regex, which I've tested on https://www.regexpal.com
([^A-Z+|a-z+].*)
I've tested it against numerous variations such as SC08093, ZC000191 and NI232312 which shouldn't match and don't in the tests, which is fine.
My sql query looks like;
SELECT companyNumber FROM company_data
WHERE companyNumber ~ '([^A-Z+|a-z+].*)' order by companyNumber desc
To summerise, strings like SC08093 should not match as they start with letters.
I've read through the documentation for postgres but I couldn't seem to find anything regarding this. I'm not sure what I'm missing here. Thanks.
The ~ '([^A-Z+|a-z+].*)' does not work because this is a [^A-Z+|a-z+].* regex matching operation that returns true even upon a partial match (regex matching operation does not require full string match, and thus the pattern can match anywhere in the string). [^A-Z+|a-z+].* matches a letter from A to Z, +,|or a letter fromatoz`, and then any amount of any zero or more chars, anywhere inside a string.
You may use
WHERE companyNumber NOT SIMILAR TO '[A-Za-z]{2}%'
See the online demo
Here, NOT SIMILAR TO returns the inverse result of the SIMILAR TO operation. This SIMILAR TO operator accepts patterns that are almost regex patterns, but are also like regular wildcard patterns. NOT SIMILAR TO '[A-Za-z]{2}%' means all records that start with two ASCII letters ([A-Za-z]{2}) and having anything after (%) are NOT returned and all others will be returned. Note that SIMILAR TO requires a full string match, same as LIKE.
Your pattern: [^A-Z+|a-z+].* means "a string where at least some characters are not A-Z" - to extend that to the whole string you would need to use an anchored regex as shown by S-Man (the group defined with (..) isn't really necessary btw)
I would probably use a regex that specifies want the valid pattern is and then use !~ instead.
where company !~ '^[0-9].*$'
^[0-9].*$ means "only consists of numbers" and the !~ means "does not match"
or
where not (company ~ '^[0-9].*$')
Not start with a letter could be done with
WHERE company ~ '^[^A-Za-z].*'
demo: db<>fiddle
The first ^ marks the beginning. The [^A-Za-z] says "no letter" (including small and capital letters).
Edit: Changed [A-z] into the more precise [A-Za-z] (Why is this regex allowing a caret?)

Teradata regular expressions, 0 or 1 spaces

In Teradata, I'm looking for one regular expression pattern that would allow me to find a pattern of some numbers, then a space or maybe no space, and then 'SF'. It should return 7 in both cases below:
SELECT
REGEXP_INSTR('12345 1000SF', pattern),
REGEXP_INSTR('12345 1000 SF', pattern)
Or, my actual goal is to extract the 1000 in both cases if there's an easier way, probably using REGEXP_SUBSTR. More details are below if you need them.
I have a column that contains free text and I would like to extract the square footage. But, in some cases, there is a space between the number and 'SF' and in some cases there is not:
'other stuff 1000 SF'
'other stuff 1000SF'
I am trying to use the REGEXP_INSTR function to find the starting position. Through google, I have found the pattern for the first to be
'([0-9])+ SF'
When I try the pattern for the second, I try
'([0-9])+SF'
and I get the error
SELECT Failed. [2662] SUBSTR: string subscript out of bounds
I've also found an answer to a similar questions, but they don't work for Teradata. For example, I don't think you can use ? in Teradata.
The error message indicates you're using SUBSTR, not REGEXP_SUBSTR.
Try this:
RegExp_Substr(col, '[0-9]*(?= {0,1}SF)')
Find multiple digits followed by a single optional blank followed by SF and extract those digits.
I would pattern it like this:
\b(\d+)\s*[Ss][Ff]\b
\b # word boundary
(\d+) # 1 or more digits (captured)
\s* # 0 or more white-space characters
[Ss] # character class
[Ff] # character class
\b # word boundary
Demo

Regular expression for pattern xxxx/xxxxU

I'm looking for a regular expression to match the pattern xxxx/xxxxU, where x can be 0-9 and the "U" at the end is optional.
Valid examples: 1111/1111, 1111/1111U
Invalid examples: 1111/1111Z, 111/1111
I could reach until '[0-9]{4}/[0-9]{4}', but I'm not sure how to handle the optional "U" at the end.
Always match numbers-slash-numbers; U included, if present:
[0-9]{4}/[0-9]{4}U?
or if you replace the number ranges with \d (for a digit character):
\d{4}/\d{4}U?
The ? means zero or one of the preceding character. So zero or one of U. Test it
The entire string 1111\2222U will be matched, while the match for 1111\2222Z will include the digits-slash-digits part but not the Z.
Only match if string ends in a digit or U:
If a string fragment ending in any letter other than U is not to be matched at all, try something like:
^\d{4}/\d{4}U?$
which matches if the numbers-slash-numbers plus optional U is the only content in the string (test it) or
\d{4}/\d{4}U?(\s|$)
which matches if the numbers-slash-numbers plus optional U is followed by either a white space character (included in the match) or the end of the string. (Test it.)
(Note: the test it links show the "/" between the numbers escaped with "\" [e.g. "/"]--something required by that implementation. I'm not familiar with Oracle's regex syntax, so this may not be required on that platform.)
I'd use '[0-9]{4}/[0-9]{4}U?' or '[0-9]{4}/[0-9]{4}U{0,1}'
Found at: http://docs.oracle.com/cd/B12037_01/appdev.101/b10795/adfns_re.htm
You could try this expression:
[0-9]{4}/[0-9]{4}U?
The ? means: optional (0 or 1). Have a look at this useful regex overview table.
Thanks for all your comments, but there were some cases which your answer did not cover.
'^[0-9]{4}/[0-9]{4}U?$'
The above works for all of my above cases