RegEx and matching codes right to left - vb.net

Stuggling a little bit with the RegEx, I've got 4 codes in a string
CODE4:CODE3:CODE2:CODE1
each code is optional apart from CODE1
So I could have ab:bc:de:fg
or
bc::fg
of
ab:::fg
In each case of the above CODE1 = fg dnd for the dear life of me I can't work out the RegEX
Would be easy to do as a standard string parse, but unforunatly because of buisness objects in needs to be done via regex :-( and return via a vb.net RegEX.matche,groups("Code1") fg (I hope that makes sense)
Thanks in advance for any help
Ended up with a bit of RegEx that does the job, bit messy but it works
(^(?<code1>[\w]*)$)|(^(?<code2>[\w]*):(?<code1>[\w]*)$)|(^(?<code3>[\w]*):(?<code2>[\w]*):(?<code1>[\w]*)$)|(^(?<code4>[\w]*):(?<code3>[\w]*):(?<code2>[\w]*):(?<code1>[\w]*)$)
Ta all

There's no need to use a regular expression here.
I don't know what language you're using, but split the string on ':' and you'll have an array of codes.
If you really just want to validate whether a string is valid for this then
/(\w*:){0,3}\w+/
matches your description and the few examples you've given.

I'm not sure why you have to match the codes right to left. Simply use a regular expression to pick apart the string:
/(.*):(.*):(.*):(.+)/
and then you have CODE1 in $4, CODE2 in $3, CODE3 in $2, CODE4 in $1.

(CODE1)?:(CODE2)?:(CODE3)?:CODE4 would work - if the leading : don't matter. Otherwise, if you can't have leading colons, enumerate:
(CODE1:(CODE2)?:(CODE3)?:|CODE2:(CODE3)?:|CODE3)?CODE4
There's nothing special about the fact that the right-most part is mandatory, and the left-most parts aren't.

Related

regex not working correctly when the test is fine

For my database, I have a list of company numbers where some of them start with two letters. I have created a regex which should eliminate these from a query and according to my tests, it should. But when executed, the result still contains the numbers with letters.
Here is my regex, which I've tested on https://www.regexpal.com
([^A-Z+|a-z+].*)
I've tested it against numerous variations such as SC08093, ZC000191 and NI232312 which shouldn't match and don't in the tests, which is fine.
My sql query looks like;
SELECT companyNumber FROM company_data
WHERE companyNumber ~ '([^A-Z+|a-z+].*)' order by companyNumber desc
To summerise, strings like SC08093 should not match as they start with letters.
I've read through the documentation for postgres but I couldn't seem to find anything regarding this. I'm not sure what I'm missing here. Thanks.
The ~ '([^A-Z+|a-z+].*)' does not work because this is a [^A-Z+|a-z+].* regex matching operation that returns true even upon a partial match (regex matching operation does not require full string match, and thus the pattern can match anywhere in the string). [^A-Z+|a-z+].* matches a letter from A to Z, +,|or a letter fromatoz`, and then any amount of any zero or more chars, anywhere inside a string.
You may use
WHERE companyNumber NOT SIMILAR TO '[A-Za-z]{2}%'
See the online demo
Here, NOT SIMILAR TO returns the inverse result of the SIMILAR TO operation. This SIMILAR TO operator accepts patterns that are almost regex patterns, but are also like regular wildcard patterns. NOT SIMILAR TO '[A-Za-z]{2}%' means all records that start with two ASCII letters ([A-Za-z]{2}) and having anything after (%) are NOT returned and all others will be returned. Note that SIMILAR TO requires a full string match, same as LIKE.
Your pattern: [^A-Z+|a-z+].* means "a string where at least some characters are not A-Z" - to extend that to the whole string you would need to use an anchored regex as shown by S-Man (the group defined with (..) isn't really necessary btw)
I would probably use a regex that specifies want the valid pattern is and then use !~ instead.
where company !~ '^[0-9].*$'
^[0-9].*$ means "only consists of numbers" and the !~ means "does not match"
or
where not (company ~ '^[0-9].*$')
Not start with a letter could be done with
WHERE company ~ '^[^A-Za-z].*'
demo: db<>fiddle
The first ^ marks the beginning. The [^A-Za-z] says "no letter" (including small and capital letters).
Edit: Changed [A-z] into the more precise [A-Za-z] (Why is this regex allowing a caret?)

How should a string be matched with a regular expression in Objective C

I'm finding it hard to match strings using NSRegularExpression. Generic alpha characters are not a problem with [a-z] but if I need to match a word like 'import' I'm struggling to make it work. I'm sure I have to escape the word in some manner but I can't find any docs around this. A really basic example would be
{{import "hello"}}
where I want to get hold of the string: hello
edit: to clarify - 'hello' could be any string - it's the bit I want returned
This regular expression matches the text between the "-s in your example:
\{\{import "([^"]+)"\}\}
The match will be stored in the first match group.

Regex for letters, digits, no spaces

I'm trying to create a Regex to check for 6-12 characters, one being a digit, the rest being any characters, no spaces. Can Regex do this? I'm trying to do this in objective-c and I'm not familiar with Regex at all. I've been reading a couple tutorials, but most are for matching simple cases of a number, or a set of numbers, but not exactly what i'm looking for. I can do it with methods, but I was wondering if it that would be too slow and I figured I could try learning something new.
asdfg1 == ok
asdfg 1 != ok
asdfgh != ok
123456 != ok
asdfasgdasgdasdfasdf != ok
use this regex ^(?=.*\d)(?=.*[a-zA-Z])[^ ]{6,12}$
It seems that you mean "letter" when you say "character", right? And (thanks to burning_LEGION for pointing that out) there may be only one digit?
In that case, use
^(?=\D*\d\D*$)[^\W_]{6,12}$
Explanation:
^ # Start of string
(?=\D*\d\D*$) # Assert that there is exactly one digit in the string
[^\W_] # Match a letter or digit (explanation below)
{6,12} # 6-12 times
$ # End of string
[^\W_] might look a little odd. How does it work? Well, \w matches any letter, digit or underscore. \W matches anything that \w doesn't match. So [^\W] (meaning "match any character that is not not alphanumeric/underscore") is essentially the same as \w, but by adding _ to this character class, we can remove the underscore from the list of allowed characters.
i didn't try though, but i think here is the answer
(^[^\d\x20]*\d[^\d\x20]*$){6,12}
This is for one digit: ^[^\d\x20]{0,11}\d{1}[^\d\x20]{0,11}$ but I can`t get limited to 6-12 length, you can use other function to check length first and if it from 6 to 12 check with this regex witch I wrote.

Collect a word between two spaces in objective c

I'm trying to implement stuff similar to spell check, but I need to get the word that is limited by a space. EX: "HI HOW R U", I need to collect HI, HOW and so on as they type. i.e. After user hits HI and space I need to collect HI and do a spell check.
Check the documentation for NSString Here. You want the message componentsSepeparatedByString:.
I don't know objective-C, but I'm fairly sure it'll have a Regexp library - although it'd be straightforward to code it without one.
Regexp: \b([^\s])*\b
\b = word boundary (whitespace, comma, dot, exclamation-mark, etc.)
\s = whitespace character
[...] = character set
[^...] = negated character set (any character(s) EXCEPT ...)
() = grouping construct
* = zero or more times
So the suggested expression would start matching at any word boundary, then match every subsequent character that is not a whitespace character, then match a word boundary.
Your stated case is so simple you may just want to look for spaces (one char at a time) and get the substring, but RegExp is very widely used across a range of languages and platforms, and so it's fairly easy to find an expression when you need to - and one often does for common stuff like checking if zip codes, phone numbers, email addresses and so on are syntactically correct. So it's worth learning in any case. :)

Regular expression for extracting a number

I would like to be able to extract a number from within a string formatted as follows:
"<[1085674730]> hello foo1, how are you doing?"
I'm a novice with regular expressions, I only want to be able to extract a number that is enclosed in the greater/less-than and bracket symbols, but I'm not sure how to go about it. I have to match numeric digits only, but I'm not sure what syntax is used for only searching within these symbols.
UPDATE:
Thank you all for you input, sorry for not being more specific, as I explained to kiamlaluno, I'm using VB.Net as the language for my application. I was wondering why some of the implementations were not working. In fact, the only one that did work was the one described by Matthew Flaschen. But that captures the symbols around the number as well as the number itself. I would like to only capture the number that is encased in the symbols and filter out the symbols themselves.
Use:
<\[(\d+)\]>
This is tested with ECMAScript regex.
It means:
\[ - literal [
( - open capturing group
\d - digit
+ - one or more
) - close capturing group
\] - literal ]
The overall functionality is to capture one or more digits surrounded by the given characters.
Combine Mathews post with lookarounds http://www.regular-expressions.info/lookaround.html. This will exclude the prefix and suffix.
(?<=<\[)\d+(?=\]>)
I didn't test this regex but it should be very close to what you need. Double check at the link provided.
Hope this helps!
$subject = "<[1085674730]> hello foo1, how are you doing?";
preg_match('/<\[(\d+)\]>/', $subject, $matches);
$matches[1] will contain the number you are looking for.
Use:
/<\[([[:digit:]]+)\]>/
If your implementation doesn't support the handy [:digit:] syntax, then use this:
/<\[([\d]+)\]>/
And if your implementation doesn't support the handy \d syntax, then use this:
/<\[([0-9]+)\]>/