How to check if VARCHAR contains at least one Uppercase letter - PostgreSQL - sql

I want to check if my VARCHAR(30) code contains more than 10 letters and has at least one uppercase letter. Here is how I wrote this:
code VARCHAR(30) CHECK(char_length(code) > 10 AND code LIKE '?=.*[A-Z]')
I used ?=.*[A-Z] regex with positive look ahead to check if there is uppercase letter in my code.
But I repeatedly get:
ERROR: new row for relation "vouchercode" violates check constraint "vouchercode_code_check"
Is my regex wrong?

You want a case sensitive regular expression. That would be:
check (code ~ '[A-Z]')
By default, ~ is case-sensitive. You would use ~* for the case-insensitive version.

Related

Regex like telephone number on Hive without prefix (+01)

We have a problem with a regular expression on hive.
We need to exclude the numbers with +37 or 0037 at the beginning of the record (it could be a false result on the regex like) and without letters or space.
We're trying with this one:
regexp_like(tel_number,'^\+37|^0037+[a-zA-ZÀÈÌÒÙ ]')
but it doesn't work.
Edit: we want it to come out from the select as true (correct number) or false.
To exclude numbers which start with +01 0r +001 or +0001 and having only digits without spaces or letters:
... WHERE tel_number NOT rlike '^\\+0{1,3}1\\d+$'
Special characters like + and character classes like \d in Hive should be escaped using double-slash: \\+ and \\d.
The general question is, if you want to describe a malformed telephone number in your regex and exclude everything that matches the pattern or if you want to describe a well-formed telephone number and include everything that matches the pattern.
Which way to go, depends on your scenario. From what I understand of your requirements, adding "not starting with 0037 or +37" as a condition to a well-formed telephone number could be a good approach.
The pattern would be like this:
Your number can start with either + or 00: ^(\+|00)
It cannot be followed by a 37 which in regex can be expressed by the following set of alternatives:
a. It is followed first by a 3 then by anything but 7: 3[0-689]
b. It is followed first by anything but 3 then by any number: [0-24-9]\d
After that there is a sequence of numbers of undefined length (at least one) until the end of the string: \d+$
Putting everything together:
^(\+|00)(3[0-689]|[0-24-9]\d)\d+$
You can play with this regex here and see if this fits your needs: https://regex101.com/r/KK5rjE/3
Note: as leftjoin has pointed out: To use this regex in hive you might need to additionally escape the backslashes \ in the pattern.
You can use
regexp_like(tel_number,'^(?!\\+37|0037)\\+?\\d+$')
See the regex demo. Details:
^ - start of string
(?!\+37|0037) - a negative lookahead that fails the match if there is +37 or 0037 immediately to the right of the current location
\+? - an optional + sign
\d+ - one or more digits
$ - end of string.

PostgreSQL "invalid regular expression: invalid escape \ sequence" when using Regex constraint

This is my SQL code:
CREATE TABLE country (
id serial NOT NULL PRIMARY KEY,
name varchar(100) NOT NULL CHECK(name ~ '^[-\p{L} ]{2,100}$'),
code varchar(3) NOT NULL
);
Notice the regex constraint at the name attribute. The code above will result in ERROR: invalid regular expression: invalid escape \ sequence.
I tried using escape CHECK(name ~ E'^[-\\p{L} ]{2,100}$') but again resulted in ERROR: invalid regular expression: invalid escape \ sequence.
I am also aware that if I do CHECK(name ~ '^[-\\p{L} ]{2,100}$'), or CHECK(name ~ E'^[-\p{L} ]{2,100}$'), - the SQL will receive wrong Regex and therefore will throw a constraint violation when inserting valid data.
Does PostgreSQL regex constraints not support regex patterns (\p) or something like that?
Edit #1
The Regex ^[-\p{L} ]{2,100}$ is basically allows country name that are between 2-100 characters and the allowed characters are hyphen, white-space and all letters (including latin letters).
NOTE: The SQL runs perfectly fine during the table creation but will throw the error when inserting valid data.
Additional Note: I am using PostgreSQL 12.1
The \p{L} Unicode category (property) class matches any letter, but it is not supported in PostgreSQL regex.
You may get the same behavior using a [:alpha:] POSIX character class
'^[-[:alpha:] ]{2,100}$'

regex not working correctly when the test is fine

For my database, I have a list of company numbers where some of them start with two letters. I have created a regex which should eliminate these from a query and according to my tests, it should. But when executed, the result still contains the numbers with letters.
Here is my regex, which I've tested on https://www.regexpal.com
([^A-Z+|a-z+].*)
I've tested it against numerous variations such as SC08093, ZC000191 and NI232312 which shouldn't match and don't in the tests, which is fine.
My sql query looks like;
SELECT companyNumber FROM company_data
WHERE companyNumber ~ '([^A-Z+|a-z+].*)' order by companyNumber desc
To summerise, strings like SC08093 should not match as they start with letters.
I've read through the documentation for postgres but I couldn't seem to find anything regarding this. I'm not sure what I'm missing here. Thanks.
The ~ '([^A-Z+|a-z+].*)' does not work because this is a [^A-Z+|a-z+].* regex matching operation that returns true even upon a partial match (regex matching operation does not require full string match, and thus the pattern can match anywhere in the string). [^A-Z+|a-z+].* matches a letter from A to Z, +,|or a letter fromatoz`, and then any amount of any zero or more chars, anywhere inside a string.
You may use
WHERE companyNumber NOT SIMILAR TO '[A-Za-z]{2}%'
See the online demo
Here, NOT SIMILAR TO returns the inverse result of the SIMILAR TO operation. This SIMILAR TO operator accepts patterns that are almost regex patterns, but are also like regular wildcard patterns. NOT SIMILAR TO '[A-Za-z]{2}%' means all records that start with two ASCII letters ([A-Za-z]{2}) and having anything after (%) are NOT returned and all others will be returned. Note that SIMILAR TO requires a full string match, same as LIKE.
Your pattern: [^A-Z+|a-z+].* means "a string where at least some characters are not A-Z" - to extend that to the whole string you would need to use an anchored regex as shown by S-Man (the group defined with (..) isn't really necessary btw)
I would probably use a regex that specifies want the valid pattern is and then use !~ instead.
where company !~ '^[0-9].*$'
^[0-9].*$ means "only consists of numbers" and the !~ means "does not match"
or
where not (company ~ '^[0-9].*$')
Not start with a letter could be done with
WHERE company ~ '^[^A-Za-z].*'
demo: db<>fiddle
The first ^ marks the beginning. The [^A-Za-z] says "no letter" (including small and capital letters).
Edit: Changed [A-z] into the more precise [A-Za-z] (Why is this regex allowing a caret?)

Oracle REGEXP_LIKE doesn't work as expected

I was testing a regular expression in Oracle SQL and found something I could not understand:
-- NO MATCH
SELECT 1 FROM DUAL WHERE REGEXP_LIKE ('Professor Frank', '(^|\s)Prof[^\s]*(\s|$)');
Above doesn't match, while the following matches:
-- MATCH
SELECT 1 FROM DUAL WHERE REGEXP_LIKE ('Professor Frank', '(^|\s)Prof\S*(\s|$)');
In other regex flavors, It will be like \bProf[^\s]*\b versus \bProf\S*\b and have similar results. Note: Oracle SQL regex does not have \b or word boundary.
Question: Why don't [^\s]* and \S* work the same way in Oracle SQL?
I notice if I remove the (\s|$) at the end, the first regex will match.
In Oracle regular expressions, \s is indeed the escape sequence for a space, but NOT in a matching character set (that is, [.....], or [^....] for excluding one character). In a matching character set, only two characters have a special meaning, - for ranges and ] for closing the set enumeration. They can't be escaped; if needed in the matching set, ] must always be the first character right after the opening [ (it is the ONLY position in which a closing ] stands for itself as a character, and does not denote the end of the matching set), and - must be first or last (best to leave it always to the end of the matching set) - anywhere else it is seen as a range marker. To include (or exclude, if using the [^.....] syntax) a space, just type an actual physical space in the matching set.
Edit: What I said above is not entirely right. There is another special character in a matching set, namely ^. If it is used in the first position, it means "match any character OTHER THAN." In any other position it stands for itself. For example, '[^^]' will match any single character OTHER THAN ^ (the first ^ has special meaning, the second stands in for itself). And, a closing bracket ] stands for itself if it is the second character in brackets, if the first character is ^ (with its SPECIAL meaning). That is, to match any single character OTHER THAN ], we can use the matching pattern '[^]]'.

Using RegExp in sql to find rows that only contain 'x'

How do i use a regexp to only find rows where the first name only includes one type of character 'x' but it doesnt matter how many characters there are.
So far I came up with:
REGEXP_LIKE(LOWER(fst_name),'^x+$'))
possible rows I am looking for:
'x'
'xx'
'xxx'
'xxxxxxxxx'
So im interpreting this as meaning find the rows where x is at the beginning and the end of the field and there can be only x's inbetween. Am I interpreting this correctly?
or is it possible to have: 'xxxxxxaxxxxx'
Your regex is correct:
^x+$
^ is the "start" anchor
x is the character for which you are searching. I assume it isn't a regex metacharacter
+ is the "one or more" quantifier
$ is the "end" anchor
So I would interpret your regex to match all of the cases you supplied, and would not match something like 'xxxxaxxxx'. http://regex101.com/r/dE8vU6
It's been long enough since I used Oracle that I don't recall whether your REGEX_LIKE syntax is correct there, but it seems right to me.