PostgreSQL RETURNING fails with REGEXP_REPLACE - sql

I'm running PostgreSQL 9.4 and are inserting a lot of records into my database. I use the RETURNING clause for further use after an insert.
When I simply run:
... RETURNING my_car, brand, color, contact
everything works, but if I try to use REGEXP_REPLACE it fails:
... RETURNing my_car, brand, color, REGEXP_REPLACE(contact, '^(\+?|00)', '') AS contact
it fails with:
ERROR: invalid regular expression: quantifier operand invalid
If I simply run the query directly in PostgreSQL it does work and return a nice output.

Tried to reproduce and failed:
t=# create table s1(t text);
CREATE TABLE
t=# insert into s1 values ('+4422848566') returning REGEXP_REPLACE(t, '^(\+?|00)', '');
regexp_replace
----------------
4422848566
(1 row)
INSERT 0 1
So elaborated #pozs suggested reason:
set standard_conforming_strings to off;
leads to
WARNING: nonstandard use of escape in a string literal
LINE 1: ...alues ('+4422848566') returning REGEXP_REPLACE(t, '^(\+?|00)...
^
HINT: Use the escape string syntax for escapes, e.g., E'\r\n'.
ERROR: invalid regular expression: quantifier operand invalid
update
As OP author says standard_conforming_strings is on as supposed from 9.1 by default working with psql and is off working with pg-prommise
update from vitaly-t
The issue is simply with the JavaScript literal escaping, not with the
flag.
He elaborates further in his answer

The current value of environment variable standard_conforming_strings is inconsequential here. You can see it if you prefix your query with SET standard_conforming_strings = true;, which will change nothing.
Passing in a regEx string unescaped from the client is the same as using E prefix from the command line: E'^(\+?|00)'.
In JavaScript \ is treated as a special symbol, and you simply always have to provide \\ to indicate the symbol, which is what needed for your regular expressions.
Other than that, pg-promise will escape everything correctly, here's an example:
db.any("INSERT INTO users(name) VALUES('hello') RETURNING REGEXP_REPLACE(name, $1, $2)", ['^(\\+?|00)', 'replaced'])
To understand how the command-line works, prefix the regex string with E:
db.any("INSERT INTO users(name) VALUES('hello') RETURNING REGEXP_REPLACE(name, E$1, $2)", ['^(\\+?|00)', 'replaced'])
And you will get the same error: invalid regular expression: quantifier operand invalid.

Related

Ingres SQL illegal pattern match and illegal escape sequence, unexpected results

I am querying a database that uses Ingres 10.2. I cannot seem to get any of my pattern matching code to work as I expect it to.
For example:
SELECT TOP 20 *
FROM table_name
WHERE variable_name IN ('XaKDG', 'XaKDH')
returns both XaKDG and XaKDH as expected.
However, when trying this:
SELECT TOP 5 *
FROM table_name
WHERE variable_name LIKE 'XaKD\[GH\]' escape '\'
I get the following errors:
ERROR '22025' 1315172
Illegal pattern match specified
Illegal ESCAPE sequence.
I am baffled, as the user guide states the following:
"To match any string starting with 0 through 4, followed by an uppercase letter, then a [, any two characters and a final ]:"
name like '\[01234\]\[A-Z\][__]' escape '\'
As far as I can tell, my query should be correct. I also tried without the escape characters at all, and with double escape characters. These didn't produce any errors, but also did not return anything.
I appreciate any help.
The User guide is here: https://supportactian.secure.force.com/help/servlet/fileField?id=0BEf3000000PLPf

PostgreSQL "invalid regular expression: invalid escape \ sequence" when using Regex constraint

This is my SQL code:
CREATE TABLE country (
id serial NOT NULL PRIMARY KEY,
name varchar(100) NOT NULL CHECK(name ~ '^[-\p{L} ]{2,100}$'),
code varchar(3) NOT NULL
);
Notice the regex constraint at the name attribute. The code above will result in ERROR: invalid regular expression: invalid escape \ sequence.
I tried using escape CHECK(name ~ E'^[-\\p{L} ]{2,100}$') but again resulted in ERROR: invalid regular expression: invalid escape \ sequence.
I am also aware that if I do CHECK(name ~ '^[-\\p{L} ]{2,100}$'), or CHECK(name ~ E'^[-\p{L} ]{2,100}$'), - the SQL will receive wrong Regex and therefore will throw a constraint violation when inserting valid data.
Does PostgreSQL regex constraints not support regex patterns (\p) or something like that?
Edit #1
The Regex ^[-\p{L} ]{2,100}$ is basically allows country name that are between 2-100 characters and the allowed characters are hyphen, white-space and all letters (including latin letters).
NOTE: The SQL runs perfectly fine during the table creation but will throw the error when inserting valid data.
Additional Note: I am using PostgreSQL 12.1
The \p{L} Unicode category (property) class matches any letter, but it is not supported in PostgreSQL regex.
You may get the same behavior using a [:alpha:] POSIX character class
'^[-[:alpha:] ]{2,100}$'

Postgres syntax error when using 'like' in single quote

I am getting a syntax error in a PostgreSQL query. I am working on a project developed in YII1, I am getting an error
CDbCommand failed to execute the SQL statement: SQLSTATE[42601]:
Syntax error: 7 ERROR: syntax error at or near "s" LINE 1: ...OT NULL
AND sub_heading like '%Women and Children's Voices%'.
As you can see above, I am using the like operator in single quotes, and in the string there is another single quote (Children's). So PostgreSQL is throwing me an error. Please provide me a solution to escape the string.
You can escape a single quote in a string by using another single quote (i.e., '' instead of '. Note that these are two ' characters, not a single " character):
sub_heading LIKE '%Women and Children''s Voices%'
-- Here -----------------------------^
You should use the format function to construct the SQL statement, using the %L placeholder for the pattern.
I solved this problem by replacing the single quote with double quotes using PHP. Here is the code
There is a variable $var with value Women and Children's Voices. I replace that single quote using the str_replace() function.
$var = str_replace("'", "''", $var);

Removing replacement character � from column

Based on my research so far this character indicates bad encoding between the database and front end. Unfortunately, I don't have any control over either of those. I'm using Teradata Studio.
How can I filter this character out? I'm trying to perform a REGEX_SUBSTR function on a column that occasionally contains �, which throws the error "The string contains an untranslatable character".
Here is my SQL. AIRCFT_POSITN_ID is the column that contains the replacement character.
SELECT DISTINCT AIRCFT_POSITN_ID,
REGEXP_SUBSTR(AIRCFT_POSITN_ID, '[0-9]+') AS AUTOROW
FROM PROD_MAE_MNTNC_VW.FMR_DISCRPNCY_DFRL
WHERE DFRL_CREATE_TMS > CURRENT_DATE -25
Your diagnostic is correct, so first of all, you might want to check the Session Character Set (it is part of the connection definition).
If it is ASCII change it to UTF8 and you will be able to see the original characters instead of the substitute character.
And in case the character is indeed part of the data and not just an indication for encoding translations issues:
The substitute character AKA SUB (DEC: 26 HEX: 1A) is quite unique in Teradata.
you cannot use it directly -
select '�';
-- [6706] The string contains an untranslatable character.
select '1A'XC;
-- [6706] The string contains an untranslatable character.
If you are using version 14.0 or above you can generate it with the CHR function:
select chr(26);
If you're below version 14.0 you can generate it like this:
select translate (_unicode '05D0'XC using unicode_to_latin with error);
Once you have generated the character you can now use it with REPLACE or OTRANSLATE
create multiset table t (i int,txt varchar(100) character set latin) unique primary index (i);
insert into t (i,txt) values (1,translate ('Hello שלום world עולם' using unicode_to_latin with error));
select * from t;
-- Hello ���� world ����
select otranslate (txt,chr(26),'') from t;
-- Hello world
select otranslate (txt,translate (_unicode '05D0'XC using unicode_to_latin with error),'') from t;
-- Hello world
BTW, there are 2 versions for OTRANSLATE and OREPLACE:
The functions under syslib works with LATIN.
the functions under TD_SYSFNLIB works with UNICODE.
In addition to Dudu's excellent answer above, I wanted to add the following now that I've encountered the issue again and had more time to experiment. The following SELECT command produced an untranslatable character:
SELECT IDENTIFY FROM PROD_MAE_MNTNC_VW.SCHD_MNTNC;
IDENTIFY
24FEB1747659193DC330A163DCL�ORD
Trying to perform a REGEXP_REPLACE or OREPLACE directly on this character produces an error:
Failed [6706 : HY000] The string contains an untranslatable character.
I changed the CHARSET property in my Teradata connection from UTF8 to ASCII and I could now see the offending character, looks like a tab
IDENTIFY
Using the TRANSLATE_CHK command using this specific conversion succeeds and identifies the position of the offending character (Note that this does not work using the UTF8 charset):
TRANSLATE_CHK(IDENTIFY USING KANJI1_SBC_TO_UNICODE) AS BADCHAR
BADCHAR
28
Now this character can be dealt with using some CASE statements to remove the bad character and retain the remainder of the string:
CASE WHEN TRANSLATE_CHK(IDENTIFY USING KANJI1_SBC_TO_UNICODE) = 0 THEN IDENTIFY
ELSE SUBSTR(IDENTIFY, 1, TRANSLATE_CHK(IDENTIFY USING KANJI1_SBC_TO_UNICODE)-1)
END AS IDENTIFY
Hopes this helps someone out.

How do you escape this regular expression?

I'm looking for
"House M.D." (2004)
with anything after it. I've tried where id~'"House M\.D\." \(2004\).*'; and there's no matches
This works id~'.*House M.D..*2004.*'; but is a little slow.
I suspect you're on an older PostgreSQL version that interprets strings in a non standards-compliant C-escape-like mode by default, so the backslashes are being treated as escapes and consumed. Try SET standard_conforming_strings = 'on';.
As per the lexical structure documentation on string constants, you can either:
Ensure that standard_conforming_strings is on, in which case you must double any single quotes (ie ' becomes '') but backslashes aren't treated as escapes:
id ~ '"House M\.D\." \(2004\)'
Use the non-standard, PostgreSQL-specific E'' syntax and double your backslashes:
id ~ E'"House M\\.D\\." \\(2004\\)'
PostgreSQL versions 9.1 and above set standard_conforming_strings to on by default; see the documentation.
You should turn it on in older versions after testing your code, because it'll make updating later much easier. You can turn it on globally in postgresql.conf, on a per-user level with ALTER ROLE ... SET, on a per-database level with ALTER DATABASE ... SET or on a session level with SET standard_conforming_strings = on. Use SET LOCAL to set it within a transaction scope.
Looks that your regexp is ok
http://sqlfiddle.com/#!12/d41d8/113
CREATE OR REPLACE FUNCTION public.regexp_quote(IN TEXT)
RETURNS TEXT
LANGUAGE plpgsql
STABLE
AS $$
/*******************************************************************************
* Function Name: regexp_quote
* In-coming Param:
* The string to decoded and convert into a set of text arrays.
* Returns:
* This function produces a TEXT that can be used as a regular expression
* pattern that would match the input as if it were a literal pattern.
* Description:
* Takes in a TEXT in and escapes all of the necessary characters so that
* the output can be used as a regular expression to match the input as if
* it were a literal pattern.
******************************************************************************/
BEGIN
RETURN REGEXP_REPLACE($1, '([[\\](){}.+*^$|\\\\?-])', '\\\\\\1', 'g');
END;
$$
Test:
SELECT regexp_quote('"House M.D." (2004)'); -- produces: "House M\\.D\\." \\(2004\\)