I am trying to replicate the IF function from MySQL into PostgreSQL.
The syntax of IF function is IF(condition, return_if_true, return_if_false)
I created following formula:
CREATE OR REPLACE FUNCTION if(boolean, anyelement, anyelement)
RETURNS anyelement AS $$
BEGIN
CASE WHEN ($1) THEN
RETURN ($2);
ELSE
RETURN ($3);
END CASE;
EXCEPTION WHEN division_by_zero THEN
RETURN ($3);
END;
$$ LANGUAGE plpgsql;
It works well with most of the things like if(2>1, 2, 1) but it raises an error for:
if( 5/0 > 0, 5, 0)
fatal error division by zero.
In my program I can't check the denominator as the condition is provided by user.
Is there any way around? Maybe if we can replace first parameter from boolean to something else, as in that case the function will work as it will raise and return the exception.
PostgreSQL is following the standard
This behaviour appears to be specified by the SQL standard. This is the first time I've seen a case where it's a real problem, though; you usually just use a CASE expression or a PL/PgSQL BEGIN ... EXCEPTION block to handle it.
MySQL's default behaviour is dangerous and wrong. It only works that way to support older code that relies on this behaviour. It has been fixed in newer versions when strict mode is active (which it absolutely always should be) but unfortunately has not yet been made the default. When using MySQL, always enable STRICT_TRANS_TABLES or STRICT_ALL_TABLES.
ANSI-standard zero division is a pain sometimes, but it'll also protect against mistakes causing data loss.
SQL injection warning, consider re-design
If you're executing expressions from the user then you quite likely have SQL injection problems. Depending on your security requirements you might be able to live with that, but it's pretty bad if you don't totally trust all your users. Remember, your users could be tricked into entering the malicious code from elsewhere.
Consider re-designing to expose an expression builder to the user and use a query builder to create the SQL from the user expressions. This would be much more complicated, but secure.
If you can't do that, see if you can parse the expressions the user enters into an abstract syntax, validate it before execution, and then produce new SQL expressions based on the parsed expression. That way you can at least limit what they can write, so they don't slip any nasties into the expression. You can also rewrite the expression to add things like checks for zero division. Finding (or writing) parsers for algebraic expressions isn't likely to be hard, but it'll depend on what kinds of expressions you need to let users write.
At minimum, the app needs to be using a role ("user") that has only SELECT privileges on the tables, is not a superuser, and does not own the tables. That'll minimise the harm any SQL injection will cause.
CASE won't solve this problem as written
In any case, because you currently don't validate and can't inspect the expression from the user, you can't use the SQL-standard CASE statement to solve this. For if( a/b > 0, a, b) you'd usually write something like:
CASE
WHEN b = 0 THEN b
ELSE CASE
WHEN a/b=0 THEN a
ELSE b
END
END
This explicitly handles the zero denominator case, but is only possible when you can break the expression up.
Ugly workaround #1
An alternative solution would be to get Pg to return a placeholder instead of raising an exception for division by zero by defining a replacement division operator or function. This will only solve the divide-by-zero case, not others.
I wanted to return 'NaN' as that's the logical result. Unfortunately, 'NaN' is greater than numbers not less then, and you want a less-than or false-like result.
regress=# SELECT NUMERIC 'NaN' > 0;
?column?
----------
t
(1 row)
This means we have to use the icky hack of returning NULL instead:
CREATE OR REPLACE FUNCTION div_null_on_zero(numeric,numeric) returns numeric AS $$
VALUES (CASE WHEN $2 = 0 THEN NULL ELSE $1/$2 END)
$$ LANGUAGE 'SQL' IMMUTABLE;
CREATE OPERATOR #/# (
PROCEDURE = div_null_on_zero(numeric,numeric),
LEFTARG = numeric,
RIGHTARG = numeric
);
with usage:
regress=# SELECT 5 #/# 0, 5 #/# 0>0, CASE WHEN 5 #/# 0 > 0 THEN 5 ELSE 0 END;
?column? | ?column? | case
----------+----------+------
| | 0
(1 row)
Your app can rewrite '/' in incoming expressions into #/# or whatever operator name you choose pretty easily.
There's one pretty critical problem with this approach, and that's that #/# will have different precedence to / so expressions without explicit parentheses may not be evaluated as you expect. You might be able to get around this by creating a new schema, defining an operator named / in that schema that does your null-on-error trick, and then adding that schema to your search_path before executing user expressions. It's probably a bad idea, though.
Ugly workaround #2
Since you can't inspect the denominator, all I can think of is to wrap the whole thing in a DO block (Pg 9.0+) or PL/PgSQL function and catch any exceptions from the evaluation of the expression.
Erwin's answer provides a better example of this than I did, so I've removed this. In any case, this is an awful and dangerous thing to do, do not do it. Your app needs to be fixed.
With a boolean argument, a division by zero will always throw an exception (and that's a good thing), before your function is even called. There is nothing you can do about it. It's already happened.
CREATE OR REPLACE FUNCTION if(boolean, anyelement, anyelement)
RETURNS anyelement LANGUAGE SQL AS
$func$
SELECT CASE WHEN $1 THEN $2 ELSE $3 END
$func$;
I would strongly advise against a function named if to begin with. IF is a keyword in PL/pgSQL. If you use user defined functions written in PL/pgSQL this will be very confusing.
Just use the standard SQL expression CASE directly.
The alternative would be to take a text argument and evaluate it with dynamic SQL.
Proof of concept
What you ask for would work like this:
CREATE OR REPLACE FUNCTION f_if(_expr text
, _true anyelement
, _else anyelement
, OUT result anyelement)
RETURNS anyelement LANGUAGE plpgsql AS
$func$
BEGIN
EXECUTE '
SELECT CASE WHEN (' || _expr || ') THEN $1 ELSE $2 END' -- !! dangerous !!
USING _true, _else
INTO result;
EXCEPTION WHEN division_by_zero THEN
result := _else;
-- possibly catch more types of exceptions ...
END
$func$;
Test:
SELECT f_if('TRUE' , 1, 2) --> 1
,f_if('FALSE' , 1, 2) --> 2
,f_if('NULL' , 1, 2) --> 2
,f_if('1/0 > 0', 1, 2); --> 2
This is a big security hazard in the hands of untrusted users. Read #Craig's answer about making this more secure.
However, I fail to see how it can be made bulletproof and would never use it.
Related
I created a UDF to test this point:
CREATE OR REPLACE function EDW_WEATHER.CHK_READING(p_reading VARCHAR2, P_SENSOR VARCHAR2 )
RETURNS VARCHAR2
LANGUAGE JAVASCRIPT
AS
$$
if (P_SENSOR == 'A') return p_reading;
$$
;
It runs correctly:
select EDW_WEATHER.CHK_READING('A', 'B');
By simply lowercasing the variable P_SENSOR as:
CREATE OR REPLACE function EDW_WEATHER.CHK_READING(p_reading VARCHAR2, p_sensor VARCHAR2 )
RETURNS VARCHAR2
LANGUAGE JAVASCRIPT
AS
$$
if (p_sensor == 'A') return p_reading;
$$
;
I get this when I run the UDF:
100132 (P0000): JavaScript execution error: Uncaught ReferenceError:
p_sensor is not defined in CHK_READING at 'if (p_sensor == 'A')
return p_reading;' position 0
My question is whether Snowflake really require variables (used in "if" or "case" statements) to be in uppercase, or am I doing something wrong.
So there are two things. One is your session's handling of database object names. Which defaults to all objects are upper case by default (aka case does not matter), and in that context using double quotes, will mean "how I have the case is the intended case to use, also white space and other stuff is ok in here". This can be changed by session variables, but that leads to all sorts of troubles with views that using quotes and the case being respected an not for different sessions.
Then there is where that name convention behavior interacts/intersects with UDF code, where by case matters, and Snowflake have gone with "the objects true name is what it needs to be called in the JavaScript.
So if you are accessing passed in parameter and you do not double quote it's name (when you declare it), you will always have to refer to it SHOUTING style. But if you use the double quotes on the variable names when declaring the function (in a session where "quotes are respected") then you can have lower case variables in your javascript, which I show in the second example.
thus normally this function
CREATE OR REPLACE function EDW_WEATHER.CHK_READING(p_reading VARCHAR2, p_sensor VARCHAR2 )
the inputs are only P_READING & P_SENSOR
and here in this case
CREATE OR REPLACE function EDW_WEATHER.CHK_READING("p_reading" VARCHAR2, "p_SeNsOr" VARCHAR2 )
the input are only p_reading & p_SeNsOr
so you could change your function like this
CREATE OR REPLACE function CHK_READING("p_reading" VARCHAR2, "p_sensor" VARCHAR2 )
RETURNS VARCHAR2
LANGUAGE JAVASCRIPT
AS
$$
if (p_sensor == 'A') return p_reading;
$$
;
and then happiness is true!
select CHK_READING('A', 'B');
see above how you named the function an object CHK_READING you can also call it via chk_reading because in SQL case (by default) does not matter.
So the last part: you question:
My question is whether Snowflake really require variables (used in "if" or "case" statements) to be in uppercase, or am I doing something wrong.
it is not a matter of things in IFs or CASEs, but when you use the input varaible.
As mentioned by the doc here:
https://docs.snowflake.com/en/developer-guide/udf/javascript/udf-javascript-introduction.html#javascript-arguments-and-returned-values
Note that an unquoted identifier must be referenced with the capitalized variable name.
EDIT: I changed my example a bit because it was incorrect and misleading. Here is a more correct one (I hope so).
This is a complex problem to explain, so I'll try to be as clear as I can.
I have a CASE that returns a value according to a text filter by means of the LIKE operator.
I need to generate 1 column (class_of_event) with N possible values that classify one given string in N possible categories.
This set of values searched by the LIKE operator will be used again and again in the script, and will be update occasionally.
The script is more or less like this:
SELECT
event,
CASE
WHEN
event LIKE '%MURDER%' or
event LIKE '%KILL%' or
... --and so on with many other possible values...
event LIKE '%WAR%'
THEN 'VIOLENCE'
WHEN
event LIKE '%MARRIAGE%' or
event LIKE '%MARRIED%' or
... --and so on with many other possible values...
event LIKE '%WIFE%'
THEN 'RELATIONSHIP'
ELSE NULL
END class_of_event
FROM TABLE history_facts
I know I can use the pipe | instead of the OR operator, thus writing
CASE WHEN event LIKE '%MARRIAGE%|%MARRIED%|%WIFE%' THEN 'RELATIONSHIP' ELSE null END class_of_event
instead of the long list of OR operators.
Anyway this could turn out in a VERY LONG string, because I could be willing to enlarge the set of values to be looked for.
ALSO, this set of values will be used again in the (long) script, and it will be a problem if one day I'll have to rewrite them all coherently.
So I tried putting these values in the return value of a function:
CREATE OR REPLACE FUNCTION relationship_event()
RETURNS text AS
$$SELECT text '%MARRIAGE%|%MARRIED%|%WIFE%'$$ LANGUAGE sql IMMUTABLE PARALLEL SAFE;
and then using the following:
CASE WHEN event LIKE relationship_event() THEN 'RELATIONSHIP' ELSE null END class_of_event
This seemed a good solution because I could just define or update the function once at the beginning of the script and then use it everywhere I needed it.
The problem is that this method performs quite well in some cases and horribly in other cases.
So, is there a way to:
1) write a synthetic version of event LIKE 'a' OR event LIKE 'b' OR event LIKE 'c' OR...
2) and store the strings I am looking for in some "global variable" that I can rewrite only once and re-use everywhere in the script?
Thanks everybody, this is driving me crazy.
I think I can do this easily with SAS or Python, but can't achieve it on POSTGRESQL
I know I can use the pipe | instead of the OR operator, thus writing
No, you can not. LIKE does not support a pipe as an "or" operator.
You can simplify the expressions using an array:
SELECT event,
CASE
WHEN event ilike any (array['%MURDER%','%KILL%','%WAR%'])
then 'VIOLENCE'
WHEN event ilike any (array['%MARRIAGE%','%MARRIED%','%WIFE%'])
then 'RELATIONSHIP'
END as class_of_event,
class_of_event
FROM history_facts;
You can put this into a function:
create or replace function map_event(p_input text)
returns text
as
$$
select CASE
WHEN event ilike any (array['%MURDER%','%KILL%','%WAR%'])
then 'VIOLENCE'
WHEN ilike any (array['%MARRIAGE%','%MARRIED%','%WIFE%'])
then 'RELATIONSHIP'
END;
$$
language sql
immutable;
Then you just need to call the function, rather having the CASE expression:
select event,
map_event(event) as class_of_event
from history_facts;
I'm facing an issue from yesterday and I can't understand why my SQL is not working..
This may be a simple error since i'm a beginner in SQL but I can't find where it is.
Here is what I try to do:
CREATE FUNCTION test() RETURN integer AS $$
BEGIN
FOR i IN 1..5 LOOP
SELECT * from result WHERE id=i;
end loop;
RETURN 1;
END;
$$ LANGUAGE plpgsql;
This is just a simple loop as I can find in the documentation but I have this error:
Error report -
ERROR: syntax error at or near "RETURN" (this is the first RETURN statement in the function)
The database is in PostgreSQL and the version is 9.4.5
Why it's not working ?
There are several problems, apart from the fact that the function isn't doing anything useful:
It must be RETURNS integer, not RETURN integer.
That't what causes the error.
The SELECT has no destination. Either add an INTO clause or discard the result with
PERFORM * from result WHERE id=i;
You should indent the code correctly, so that you can read and understand it.
Is it possible to define a default value that will be returned in case a CAST operation fails?
For example, so that:
SELECT CAST('foo' AS INTEGER)
Will return a default value instead of throwing an error?
There is no default value for a CAST:
A type cast specifies a conversion from one data type to another. PostgreSQL accepts two equivalent syntaxes for type casts:
CAST ( expression AS type )
expression::type
There is no room in the syntax for anything other than the expression to be casted and the desired target type.
However, you can do it by hand with a simple function:
create or replace function cast_to_int(text, integer) returns integer as $$
begin
return cast($1 as integer);
exception
when invalid_text_representation then
return $2;
end;
$$ language plpgsql immutable;
Then you can say things like cast_to_int('pancakes', 0) and get 0.
PostgreSQL also lets you create your own casts so you could do things like this:
create or replace function cast_to_int(text) returns integer as $$
begin
-- Note the double casting to avoid infinite recursion.
return cast($1::varchar as integer);
exception
when invalid_text_representation then
return 0;
end;
$$ language plpgsql immutable;
create cast (text as integer) with function cast_to_int(text);
Then you could say
select cast('pancakes'::text as integer)
and get 0 or you could say
select cast(some_text_column as integer) from t
and get 0 for the some_text_column values that aren't valid integers. If you wanted to cast varchars using this auto-defaulting cast then you'd have to double cast:
select cast(some_varchar::text as integer) from t
Just because you can do this doesn't make it a good idea. I don't think replacing the standard text to integer cast is the best idea ever. The above approach also requires you to leave the standard varchar to integer cast alone, you could get around that if you wanted to do the whole conversion yourself rather than lazily punting to the built in casting.
NULL handling is left as an (easy) exercise for the reader.
Trap the error as described in documentation and then specify an action to do instead.
Documentation on error trapping for PostgreSQL Snippet included below.
35.7.5. Trapping Errors
By default, any error occurring in a PL/pgSQL function aborts execution of the function, and indeed of the surrounding transaction as well. You can trap errors and recover from them by using a BEGIN block with an EXCEPTION clause. The syntax is an extension of the normal syntax for a BEGIN block:
[ <<label>> ]
[ DECLARE
declarations ]
BEGIN
statements
EXCEPTION
WHEN condition [ OR condition ... ] THEN
handler_statements
[ WHEN condition [ OR condition ... ] THEN
handler_statements
... ]
END;
If no error occurs, this form of block simply executes all the statements, and then control passes to the next statement after END. But if an error occurs within the statements, further processing of the statements is abandoned, and control passes to the EXCEPTION list. The list is searched for the first condition matching the error that occurred. If a match is found, the corresponding handler_statements are executed, and then control passes to the next statement after END. If no match is found, the error propagates out as though the EXCEPTION clause were not there at all: the error can be caught by an enclosing block with EXCEPTION, or if there is none it aborts processing of the function.
I am wondering if anyone could tell me any special meaning of the dot (.) in Informix regarding expressions and etc.
For example in stored procedures I see it used with integers, decimals and chars and one thing that is bugging me quite a lot is this:
if value = '.' then
//do something
end if
The above expression validates to true when value is of type numeric (5,1) and it is equal to 0.0
I have tried looking around and I can't find information on how a dot is treated but it seems " 0.0 = '.' " validates to true.
Can you show the data types and a working stored procedure that illustrates the problem?
There isn't supposed to be any special meaning to dot in that context. It is a string, and no numeric value should be equal to it; if the number is converted to a string, there will be either nothing (for NULL) or at least one digit, neither of which is the same as the string '.', and if the string '.' is converted to a number, that conversion should fail (arguably when the procedure is created, certainly at runtime).
One thing that puzzles me is that the syntax you are showing is not SPL syntax. SPL does not use 'end if', though I4GL does. Indeed, SPL (stored procedure language) only uses END in conjunction with a matching BEGIN.
It appears that my memory is failing and that I should not try reading manuals just before midnight.
It also appears that this code does what I would not expect...
+ set debug file to "/tmp/x1";
SET DEBUG FILE TO: Rows processed = 0
+ drop procedure so2139024();
DROP PROCEDURE: Rows processed = 0
+ create procedure so2139024() returning int as rv;
define value numeric(5,1);
define rv integer;
trace on;
let rv = 0;
let value = 0.0;
if value = '.' then
let rv = 1;
end if;
return rv;
end procedure;
CREATE PROCEDURE: Rows processed = 0
+ execute procedure so2139024();
1
EXECUTE PROCEDURE: Rows processed = 1
So, for some reason, the comparison is working; the value zero compares equal to dot. This was tested on MacOS X 10.6.2 with IBM Informix Dynamic Server 11.50.FC6 (and SQLCMD 86.04, built with CSDK 3.50.FC4, but running with 3.50.FC6).
The debug file contains:
trace on
expression:(= value, ".")
evaluates to t
let rv = 1
expression:rv
evaluates to 1
procedure so2139024 returns 1
iteration of cursory procedure so2139024
A priori, this should be reported via IBM/Informix Tech Support. I think it is most likely a bug of some sort, but I don't know how it is coming up with the answer. I will check through back-door channels too.
Back-door channels show that the likely problem is that the function deccvasc() in the ESQL/C library (also used internally by the server) is mishandling the string '.'.
The ESQL/C test code here shows that:
#include <stdio.h>
#include "dumpesql.h"
int main(void)
{
dec_t d;
int rc = deccvasc(".", 1, &d);
printf("rc = %d\n", rc);
dump_decimal(stdout, "Decimal", &d);
return(0);
}
The dump_decimal() function is non-standard, but prints information from the decimal structure. The output is:
rc = 0
DECIMAL: Decimal -- address 0x7FFF5FBFF090
E: -64, S = 1 (+), N = 0, M = value is ZERO
Consequently, the server is (mistakenly) accepting '.' as a valid representation of zero, rather than getting an error reported. For the time being, you will have to edit the stored procedure to make more sense - it is not clear what the test was supposed to achieve, but it clearly isn't written correctly. (That is not denying that there is also a bug in the server.) Please report this to IBM/Informix Technical Support.