Verhoeff Checksum function for PostgresSQL - sql

I've found a Verhoeff Checksum function for PostgresSQL at: https://github.com/HIISORG/SNOMED-CT-PostgreSQL/blob/master/Verhoeff.sql
CREATE OR REPLACE FUNCTION verhoeff_generate (
input numeric = NULL::numeric
)
RETURNS smallint AS $$
DECLARE
_c SMALLINT := 0;
_m SMALLINT;
_i SMALLINT := 0;
_n VARCHAR(255);
-- Delcare array
_d CHAR(100) := '0123456789123406789523401789563401289567401239567859876043216598710432765982104387659321049876543210';
_p CHAR(80) := '01234567891576283094580379614289160435279453126870428657390127938064157046913258';
_v CHAR(10) := '0432156789';
BEGIN
_n := REVERSE(input::TEXT);
WHILE _i<length(_n) LOOP
_m := CAST(SUBSTRING(_p,(((_i + 1)%8)*10) + CAST(SUBSTRING(_n, _i+1, 1) AS SMALLINT) + 1, 1) AS SMALLINT);
_c := CAST (substring(_d, (_c *10 + _m + 1), 1) AS SMALLINT);
_i := _i + 1;
END LOOP;
RETURN CONCAT(input, CAST(substring(_v,_c+1,1) as SMALLINT));
END; $$
LANGUAGE 'plpgsql'
IMMUTABLE
RETURNS NULL ON NULL INPUT;
I've modified the RETURN, so that it would concatenate the INPUT with the Checksum digit:
RETURN CONCAT(input, CAST(substring(_v,_c+1,1) as SMALLINT));
And I get the error:
[2020-02-20 11:53:19] [22003] ERROR: value "331010000014" is out of range for type smallint
[2020-02-20 11:53:19] Where: PL/pgSQL function verhoeff_generate(numeric) while casting return value to function's return type
I've tried:
RETURN CONCAT(input, CAST(substring(_v,_c+1,1) as BIGINT));
Still getting the same error.

You've modified the code that gives the return value, away from the original smallint it was returning, to now be a string. (CONCAT function outputs a string - you can cast numbers as many times as you like before you feed them into concat, but they will be converted into strings and then concatenated, and concat outputs a string, no matter what you feed into it).
CONCAT is now returning you a string containing too many digits (it is too numerically large) to fit into a smallint - a conversion that PG is attempting to carry out implicitly for you. This represents the core problem:
CREATE OR REPLACE FUNCTION return_big_number ()
RETURNS smallint AS $$
RETURN '32769'; --string of a number that is too big for a smallint
END; $$
'32769' is a string that cannot be converted to a smallint, because it's simply too numerically great - smallint caps out at 32767. Similarly by using concat, you're generating a string that contains digits representing a number too numerically large for a smallint
Either change the function declaration at the top so that it returns a suitable string:
RETURNS smallint AS $$
^^^^^^^^
change this to perhaps "RETURNS text AS $$"
Or if having the output as numeric would suit you better, change the function so it declares to return a numeric datatype that can represent more digits than a smallint, and change the return value calculation to keep it numeric (multiply the input by some power of 10, and add the checksum, rather than changing the input to string and concatenating the checksum)

Related

Improve aggregate function in PL/pgSQL

I've tried to create an aggregate function, which finds the minimum value in a column, then it adds the Laplacian noise.
I am using Postregres PL/pgSQL language.
The aggregate works perfectly, but I'd like to know if there is any way to improve the code that I wrote.
/*
* PLpgSQL function which behaves to aggregate the MIN(col) function then adds the laplacian noise.
* For the sensivity (which is the upper bound of the query), We use the halfed maximum value of the column called.
* Passing the array which contains the entire column values, that will be compared, to establish which one is the minimum.
* Then we compute Laplacian distribution (sensivity/epsilon). This given value is added to the minimum Value that will disturb
* the final result
*/
CREATE OR REPLACE FUNCTION addLaplacianNoiseMin (real[]) RETURNS real AS $$
DECLARE
i real;
minVal real; --minimum value which is found in the column and then disturbed
laplaceNoise real; --laplacian distribution which is computed finding the halfed maximum value, divided by an arbitrary epsilon (small value)
epsilon real := 1.2;
sensivity real; --our computed upper bound
maxVal real;
BEGIN
minVal := $1[1];
maxVal := $1[1];
IF ARRAY_LENGTH($1,1) > 0 THEN --Checking whether the array is empty or not
<<confrontoMinimo>>
FOREACH i IN ARRAY $1 LOOP --Looping through the entire array, passed as parameter
IF minVal >= i THEN
minVal := i;
ELSE
maxVal := i;
END IF;
END LOOP confrontoMinimo;
ELSE
RAISE NOTICE 'Invalid parameter % passed to the aggregate function',$1;
--Raising exception if the parameter passed as argument points to null.
RAISE EXCEPTION 'Cannot find MIN value. Parameter % is null', $1
USING HINT = 'You cannot pass a null array! Check the passed parameter';
END IF;
sensivity := maxVal/2;
laplaceNoise := sensivity/(epsilon);
RAISE NOTICE 'minVal: %, maxVal: %, sensivity: %, laplaceNoise: %', minVal, maxVal, sensivity,laplaceNoise;
minVal := laplaceNoise + minVal;
RETURN minVal;
END;
$$ LANGUAGE plpgsql;
CREATE AGGREGATE searchMinValueArray (real)
(
sfunc = array_append,
stype = real[],
finalfunc = addLaplacianNoiseMin,
initCond = '{}'
);
Yes, you can improve that by not using an array as state for the aggregate, but a composite type like:
CREATE TYPE aggstate AS (minval real, maxval real);
Then you can perform the operation from your loop in the SFUNC and don't have to keep an array in memory that can possibly become very large. The FINALFUNC would then become very simple.

Generate unique random strings in plpgsql

I am trying to write a function to create unique random tokens of variable length. However, I am stumped by the plpgsql syntax. My intention is to create a function which
Takes a table and column as input
Generates a random string of a given length, with a given set of characters
Checks if the string is already in the colum
If so (and this is expected to be rare), simply generate a new random string.
Otherwise, return the random string
My current attempt looks like this:
CREATE FUNCTION random_token(_table TEXT, _column TEXT, _length INTEGER) RETURNS text AS $$
DECLARE
alphanum CONSTANT text := 'abcdefghijkmnopqrstuvwxyz23456789';
range_head CONSTANT integer := 25;
range_tail CONSTANT integer := 33;
random_string text;
BEGIN
REPEAT
SELECT substring(alphanum from trunc(random() * range_head + 1)::integer for 1) ||
array_to_string(array_agg(substring(alphanum from trunc(random() * range_tail + 1)::integer for 1)), '')
INTO random_string FROM generate_series(1, _length - 1);
UNTIL random_string NOT IN FORMAT('SELECT %I FROM %I WHERE %I = random_string;', _column, _table, _column)
END REPEAT;
RETURN random_string;
END
$$ LANGUAGE plpgsql;
However, this doesn't work, and gives me a not very helpful error:
DatabaseError: error 'ERROR: syntax error at or near "REPEAT"
I have tried a number of variations, but without knowing what the error in the syntax is I am stumped. Any idea how to fix this function?
There is no repeat statement in plpgsql. Use simple loop.
CREATE OR REPLACE FUNCTION random_token(_table TEXT, _column TEXT, _length INTEGER) RETURNS text AS $$
DECLARE
alphanum CONSTANT text := 'abcdefghijkmnopqrstuvwxyz23456789';
range_head CONSTANT integer := 25;
range_tail CONSTANT integer := 33;
random_string text;
ct int;
BEGIN
LOOP
SELECT substring(alphanum from trunc(random() * range_head + 1)::integer for 1) ||
array_to_string(array_agg(substring(alphanum from trunc(random() * range_tail + 1)::integer for 1)), '')
INTO random_string FROM generate_series(1, _length - 1);
EXECUTE FORMAT('SELECT count(*) FROM %I WHERE %I = %L', _table, _column, random_string) INTO ct;
EXIT WHEN ct = 0;
END LOOP;
RETURN random_string;
END
$$ LANGUAGE plpgsql;
Note, random_string should be a parameter to format().
Update. According to the accurate hint from Abelisto, this should be faster for a large table:
DECLARE
dup boolean;
...
EXECUTE FORMAT('SELECT EXISTS(SELECT 1 FROM %I WHERE %I = %L)', _table, _column, random_string) INTO dup;
EXIT WHEN NOT dup;
...
This is almost certainly not what you want. When you say, "checks if the string is already in the column" you're not referring to something that looks unique, you're referring to something that actually is UNIQUE.
Instead, I would point you over this answer I gave about UUIDs.

SQL function to convert NUMERIC to BYTEA and BYTEA to NUMERIC

In PostgreSQL, how can I convert a NUMERIC value to a BYTEA value? And BYTEA to NUMERIC? Using TEXT values I can use CONVERT_TO() and CONVERT_FROM(). Is there anything simmilar? If not, how would it be the SQL function code?
Here are functions tested with PG 11. Note that numeric2bytea handles only nonnegative numbers.
CREATE OR REPLACE FUNCTION bytea2numeric(_b BYTEA) RETURNS NUMERIC AS $$
DECLARE
_n NUMERIC := 0;
BEGIN
FOR _i IN 0 .. LENGTH(_b)-1 LOOP
_n := _n*256+GET_BYTE(_b,_i);
END LOOP;
RETURN _n;
END;
$$ LANGUAGE PLPGSQL IMMUTABLE STRICT;
CREATE OR REPLACE FUNCTION numeric2bytea(_n NUMERIC) RETURNS BYTEA AS $$
DECLARE
_b BYTEA := '\x';
_v INTEGER;
BEGIN
WHILE _n > 0 LOOP
_v := _n % 256;
_b := SET_BYTE(('\x00' || _b),0,_v);
_n := (_n-_v)/256;
END LOOP;
RETURN _b;
END;
$$ LANGUAGE PLPGSQL IMMUTABLE STRICT;
Example:
=> select bytea2numeric('\xdeadbeef00decafbad00cafebabe');
bytea2numeric
------------------------------------
4516460495214885311638200605653694
(1 row)
=> select numeric2bytea(4516460495214885311638200605653694);
numeric2bytea
--------------------------------
\xdeadbeef00decafbad00cafebabe
(1 row)
I think that VARBINARY is used to store in sql for bytea.
so that convert to numeric to byte use the flowing script
select CONVERT(VARBINARY,10)
and answer will be 0x0000000A
and VARBINARY to numeric
select CONVERT(int,0x0000000A)
and answer will be 10

How to cut varchar/text before n'th occurence of delimiter? PostgreSQL

I have strings (saved in database as varchar) and I have to cut them just before n'th occurence of delimiter.
Example input:
String: 'My-Example-Awesome-String'
Delimiter: '-'
Occurence: 2
Output:
My-Example
I implemented this function for fast prototype:
CREATE OR REPLACE FUNCTION find_position_delimiter(fulltext varchar, delimiter varchar, occurence integer)
RETURNS varchar AS
$BODY$
DECLARE
result varchar = '';
arr text[] = regexp_split_to_array( fulltext, delimiter);
word text;
counter integer := 0;
BEGIN
FOREACH word IN ARRAY arr LOOP
EXIT WHEN ( counter = occurence );
IF (counter > 0) THEN result := result || delimiter;
END IF;
result := result || word;
counter := counter + 1;
END LOOP;
RETURN result;
END;
$BODY$
LANGUAGE 'plpgsql' IMMUTABLE;
SELECT find_position_delimiter('My-Example-Awesome-String', '-', 2);
For now it assumes that string is not empty (provided by query where I will call function) and delimiter string contains at least one delimiter of provided pattern.
But now I need something better for performance test. If it is possible, I would love to see the most universal solution, because not every user of my system is working on PostgreSQL database (few of them prefer Oracle, MySQL or SQLite), but it is not the most importatnt. But performance is - because on specific search, that function can be called even few hundreds times.
I didn't find anything about fast and easy using varchar as a table of chars and checking for occurences of delimiter (I could remember position of occurences and then create substring from first char to n'th delimiter position-1). Any ideas? Are smarter solutions?
# EDIT: yea, I know that function in every database will be a bit different, but body of function can be very similliar or the same. Generality is not a main goal :) And sorry for that bad function working-name, I just saw it has not right meaning.
you can try doing something based on this:
select
varcharColumnName,
INSTR(varcharColumnName,'-',1,2),
case when INSTR(varcharColumnName,'-',1,2) <> 0
THEN SUBSTR(varcharColumnName, 1, INSTR(varcharColumnName,'-',1,2) - 1)
else '...'
end
from tableName;
of course, you have to handle "else" the way you want. It works on postgres and oracle (tested), it should work on other dbms's because these are standard sql functions
//edit - as a function, however this way it's rather hard to make it cross-dbms
CREATE OR REPLACE FUNCTION find_position_delimiter(fulltext varchar, delimiter varchar, occurence integer)
RETURNS varchar as
$BODY$
DECLARE
result varchar := '';
delimiterPos integer := 0;
BEGIN
delimiterPos := INSTR(fulltext,delimiter,1,occurence);
result := SUBSTR(fulltext, 1, delimiterPos - 1);
RETURN result;
END;
$BODY$
LANGUAGE 'plpgsql' IMMUTABLE;
SELECT find_position_delimiter('My-Example-Awesome-String', '-', 2);
create or replace function trunc(string text, delimiter char, occurence int) returns text as $$
return delimiter.join(string.split(delimiter)[:occurence])
$$ language plpythonu;
# select trunc('My-Example-Awesome-String', '-', 2);
trunc
------------
My-Example
(1 row)

DB2 sql query to find non ascii characters in strings

I have a table (say ELEMENTS) with a VARCHAR field named NAME encoded in ccsid 1144. I need to find all the strings in the NAME field which contain "non ascii characters", that is characters that are in the ccsid 1144 set of characters without the ascii ones.
I think you should be able to create a function like this:
CREATE FUNCTION CONTAINS_NON_ASCII(INSTR VARCHAR(4000))
RETURNS CHAR(1)
DETERMINISTIC NO EXTERNAL ACTION CONTAINS SQL
BEGIN ATOMIC
DECLARE POS, LEN INT;
IF INSTR IS NULL THEN
RETURN NULL;
END IF;
SET (POS, LEN) = (1, LENGTH(INSTR));
WHILE POS <= LEN DO
IF ASCII(SUBSTR(INSTR, POS, 1)) > 128 THEN
RETURN 'Y';
END IF;
SET POS = POS + 1;
END WHILE;
RETURN 'N';
END
And then write:
SELECT NAME
FROM ELEMENTS
WHERE CONTAINS_NON_ASCII(NAME) = 'Y'
;
(Disclaimer: completely untested.)
By the way — judging by the documentation, it seems that VARCHAR is a string of bytes, not of Unicode characters. (Bytes range from 0 to 0xFF; Unicode characters range from 0 to 0x10FFFD.) If you're interested in supporting Unicode, you might want to use a different data-type.