PostgreSQL "invalid regular expression: invalid escape \ sequence" when using Regex constraint - sql

This is my SQL code:
CREATE TABLE country (
id serial NOT NULL PRIMARY KEY,
name varchar(100) NOT NULL CHECK(name ~ '^[-\p{L} ]{2,100}$'),
code varchar(3) NOT NULL
);
Notice the regex constraint at the name attribute. The code above will result in ERROR: invalid regular expression: invalid escape \ sequence.
I tried using escape CHECK(name ~ E'^[-\\p{L} ]{2,100}$') but again resulted in ERROR: invalid regular expression: invalid escape \ sequence.
I am also aware that if I do CHECK(name ~ '^[-\\p{L} ]{2,100}$'), or CHECK(name ~ E'^[-\p{L} ]{2,100}$'), - the SQL will receive wrong Regex and therefore will throw a constraint violation when inserting valid data.
Does PostgreSQL regex constraints not support regex patterns (\p) or something like that?
Edit #1
The Regex ^[-\p{L} ]{2,100}$ is basically allows country name that are between 2-100 characters and the allowed characters are hyphen, white-space and all letters (including latin letters).
NOTE: The SQL runs perfectly fine during the table creation but will throw the error when inserting valid data.
Additional Note: I am using PostgreSQL 12.1

The \p{L} Unicode category (property) class matches any letter, but it is not supported in PostgreSQL regex.
You may get the same behavior using a [:alpha:] POSIX character class
'^[-[:alpha:] ]{2,100}$'

Related

Remove special characters and alphabets from a string except number in sql query in db2

Hi I tried using Regex_replace and it is still not working.
select CASE WHEN sbbb <> ' ' THEN regexp_replace(sbbb,'[a-zA-Z _-#]','']
ELSE sbbb
AS ABCDF
from Table where sccc=1;
This is the query which I am using to remove alphabets and specials characters from string and have only numbers. but it doesnot work. Query returns me the complete string with numbers,characters and special characters .What is wrong in the above query
I am working on a sql query. There is a column in database which contains characters,special characters and numbers. I want to only keep the numbers and remove all the special characters and alphabets. How can I do it in query of DB2. If a use PATINDEX it is not working. please help here.
The allowed regular expression patterns are listed on this page
Regular expression control characters
Outside of a set, the following must be preceded with a backslash to be treated as a literal
* ? + [ ( ) { } ^ $ | \ . /
Inside a set, the follow must be preceded with a backslash to be treated as a literal
Characters that must be quoted to be treated as literals are [ ] \
Characters that might need to be quoted, depending on the context are - &
So for you, this should work
regexp_replace(sbbb,'[a-zA-Z _\-#]','')

How to check if VARCHAR contains at least one Uppercase letter - PostgreSQL

I want to check if my VARCHAR(30) code contains more than 10 letters and has at least one uppercase letter. Here is how I wrote this:
code VARCHAR(30) CHECK(char_length(code) > 10 AND code LIKE '?=.*[A-Z]')
I used ?=.*[A-Z] regex with positive look ahead to check if there is uppercase letter in my code.
But I repeatedly get:
ERROR: new row for relation "vouchercode" violates check constraint "vouchercode_code_check"
Is my regex wrong?
You want a case sensitive regular expression. That would be:
check (code ~ '[A-Z]')
By default, ~ is case-sensitive. You would use ~* for the case-insensitive version.

PostgreSQL RETURNING fails with REGEXP_REPLACE

I'm running PostgreSQL 9.4 and are inserting a lot of records into my database. I use the RETURNING clause for further use after an insert.
When I simply run:
... RETURNING my_car, brand, color, contact
everything works, but if I try to use REGEXP_REPLACE it fails:
... RETURNing my_car, brand, color, REGEXP_REPLACE(contact, '^(\+?|00)', '') AS contact
it fails with:
ERROR: invalid regular expression: quantifier operand invalid
If I simply run the query directly in PostgreSQL it does work and return a nice output.
Tried to reproduce and failed:
t=# create table s1(t text);
CREATE TABLE
t=# insert into s1 values ('+4422848566') returning REGEXP_REPLACE(t, '^(\+?|00)', '');
regexp_replace
----------------
4422848566
(1 row)
INSERT 0 1
So elaborated #pozs suggested reason:
set standard_conforming_strings to off;
leads to
WARNING: nonstandard use of escape in a string literal
LINE 1: ...alues ('+4422848566') returning REGEXP_REPLACE(t, '^(\+?|00)...
^
HINT: Use the escape string syntax for escapes, e.g., E'\r\n'.
ERROR: invalid regular expression: quantifier operand invalid
update
As OP author says standard_conforming_strings is on as supposed from 9.1 by default working with psql and is off working with pg-prommise
update from vitaly-t
The issue is simply with the JavaScript literal escaping, not with the
flag.
He elaborates further in his answer
The current value of environment variable standard_conforming_strings is inconsequential here. You can see it if you prefix your query with SET standard_conforming_strings = true;, which will change nothing.
Passing in a regEx string unescaped from the client is the same as using E prefix from the command line: E'^(\+?|00)'.
In JavaScript \ is treated as a special symbol, and you simply always have to provide \\ to indicate the symbol, which is what needed for your regular expressions.
Other than that, pg-promise will escape everything correctly, here's an example:
db.any("INSERT INTO users(name) VALUES('hello') RETURNING REGEXP_REPLACE(name, $1, $2)", ['^(\\+?|00)', 'replaced'])
To understand how the command-line works, prefix the regex string with E:
db.any("INSERT INTO users(name) VALUES('hello') RETURNING REGEXP_REPLACE(name, E$1, $2)", ['^(\\+?|00)', 'replaced'])
And you will get the same error: invalid regular expression: quantifier operand invalid.

Removing replacement character � from column

Based on my research so far this character indicates bad encoding between the database and front end. Unfortunately, I don't have any control over either of those. I'm using Teradata Studio.
How can I filter this character out? I'm trying to perform a REGEX_SUBSTR function on a column that occasionally contains �, which throws the error "The string contains an untranslatable character".
Here is my SQL. AIRCFT_POSITN_ID is the column that contains the replacement character.
SELECT DISTINCT AIRCFT_POSITN_ID,
REGEXP_SUBSTR(AIRCFT_POSITN_ID, '[0-9]+') AS AUTOROW
FROM PROD_MAE_MNTNC_VW.FMR_DISCRPNCY_DFRL
WHERE DFRL_CREATE_TMS > CURRENT_DATE -25
Your diagnostic is correct, so first of all, you might want to check the Session Character Set (it is part of the connection definition).
If it is ASCII change it to UTF8 and you will be able to see the original characters instead of the substitute character.
And in case the character is indeed part of the data and not just an indication for encoding translations issues:
The substitute character AKA SUB (DEC: 26 HEX: 1A) is quite unique in Teradata.
you cannot use it directly -
select '�';
-- [6706] The string contains an untranslatable character.
select '1A'XC;
-- [6706] The string contains an untranslatable character.
If you are using version 14.0 or above you can generate it with the CHR function:
select chr(26);
If you're below version 14.0 you can generate it like this:
select translate (_unicode '05D0'XC using unicode_to_latin with error);
Once you have generated the character you can now use it with REPLACE or OTRANSLATE
create multiset table t (i int,txt varchar(100) character set latin) unique primary index (i);
insert into t (i,txt) values (1,translate ('Hello שלום world עולם' using unicode_to_latin with error));
select * from t;
-- Hello ���� world ����
select otranslate (txt,chr(26),'') from t;
-- Hello world
select otranslate (txt,translate (_unicode '05D0'XC using unicode_to_latin with error),'') from t;
-- Hello world
BTW, there are 2 versions for OTRANSLATE and OREPLACE:
The functions under syslib works with LATIN.
the functions under TD_SYSFNLIB works with UNICODE.
In addition to Dudu's excellent answer above, I wanted to add the following now that I've encountered the issue again and had more time to experiment. The following SELECT command produced an untranslatable character:
SELECT IDENTIFY FROM PROD_MAE_MNTNC_VW.SCHD_MNTNC;
IDENTIFY
24FEB1747659193DC330A163DCL�ORD
Trying to perform a REGEXP_REPLACE or OREPLACE directly on this character produces an error:
Failed [6706 : HY000] The string contains an untranslatable character.
I changed the CHARSET property in my Teradata connection from UTF8 to ASCII and I could now see the offending character, looks like a tab
IDENTIFY
Using the TRANSLATE_CHK command using this specific conversion succeeds and identifies the position of the offending character (Note that this does not work using the UTF8 charset):
TRANSLATE_CHK(IDENTIFY USING KANJI1_SBC_TO_UNICODE) AS BADCHAR
BADCHAR
28
Now this character can be dealt with using some CASE statements to remove the bad character and retain the remainder of the string:
CASE WHEN TRANSLATE_CHK(IDENTIFY USING KANJI1_SBC_TO_UNICODE) = 0 THEN IDENTIFY
ELSE SUBSTR(IDENTIFY, 1, TRANSLATE_CHK(IDENTIFY USING KANJI1_SBC_TO_UNICODE)-1)
END AS IDENTIFY
Hopes this helps someone out.

Regular Expression to return when invalid character found

I have the following regex that checks for a list of valid characters:
^([a-zA-Z0-9+?/:().,' -]){1,35}$
What I now need to do now is search for any existing columns in our DB that invalidates the above regex. I'm using the oracle SQL REGEXP_LIKE command.
The problem I have is I can't seem to negate the above expression and return a value when it finds a character not in the expression e.g.
"a-valid-filename.xml" => this shouldn't be returned as it's valid.
"an_invalid-filename.xml" => I need to find these i.e. anything with an invalid character.
The obvious answer to me is to define a list of invalid characters... but that could be a long list.
You can match it against the following regex which uses the [^...] negation character class:
([^a-zA-Z0-9+?/:().,' -])
This will match any single character that is not part of the list of characters that are allowed.
You can negate a character class by inserting a caret as the first character.
Example:
[^y]
The above will match anything that is not y
Try this:
where not regexp_like(col, '^([a-zA-Z0-9+?/:().,'' -]){1,35}$')
or
where regexp_like(col, '[^a-zA-Z0-9+?/:().,'' -]')