How get all matching positions in a string? - sql

I have a column flag_acumu in a table in PostgreSQL with values like:
'SSNSSNNNNNNNNNNNNNNNNNNNNNNNNNNNNSNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN'
I need to show all positions with an 'S'. With this code, I only get the first such position, but not the later ones.
SELECT codn_conce, flag_acumu, position('S' IN flag_acumu) AS the_pos
FROM dh12
WHERE position('S' IN flag_acumu) != 0
ORDER BY the_pos ASC;
How to get all of them?

In Postgres 9.4 or later you can conveniently use unnest() in combination with WITH ORDINALITY:
SELECT *
FROM dh12 d
JOIN unnest(string_to_array(d.flag_acumu, NULL))
WITH ORDINALITY u(elem, the_pos) ON u.elem = 'S'
WHERE d.flag_acumu LIKE '%S%' -- optional, see below
ORDER BY d.codn_conce, u.the_pos;
This returns one row per match.
WHERE d.flag_acumu LIKE '%S%' is optional to quickly eliminate source rows without any matches. Pays if there are more than a few such rows.
Detailed explanation and alternatives for older versions:
PostgreSQL unnest() with element number

Since you didn't specify your needs to a point in which one could answer properly, I'm going with my assumption that you want a list of positions of occurence of a substring (can be more than 1 character long).
Here's the function to do that using:
FOR .. LOOP control structure,
function substr(text, int, int).
CREATE OR REPLACE FUNCTION get_all_positions_of_substring(text, text)
RETURNS text
STABLE
STRICT
LANGUAGE plpgsql
AS $$
DECLARE
output_text TEXT := '';
BEGIN
FOR i IN 1..length($1)
LOOP
IF substr($1, i, length($2)) = $2 THEN
output_text := CONCAT(output_text, ';', i);
END IF;
END LOOP;
-- Remove first semicolon
output_text := substr(output_text, 2, length(output_text));
RETURN output_text;
END;
$$;
Sample call and output
postgres=# select * from get_all_positions_of_substring('soklesocmxsoso','so');
get_all_positions_of_substring
--------------------------------
1;6;11;13

This works too. And a bit faster I think.
create or replace function findAllposition(_pat varchar, _tar varchar)
returns int[] as
$body$
declare _poslist int[]; _pos int;
begin
_pos := position(_pat in _tar);
while (_pos>0)
loop
if array_length(_poslist,1) is null then
_poslist := _poslist || (_pos);
else
_poslist := _poslist || (_pos + _poslist[array_length(_poslist,1)] + 1);
end if;
_tar := substr(_tar, _pos + 1, length(_tar));
_pos := position(_pat in _tar);
end loop;
return _poslist;
end;
$body$
language plpgsql;
Will return a position list which is an int array.
{position1, position2, position3, etc.}

Related

Simple word replacement in output from SQL

I'm running PostgreSQL 9.4.
Is there a replace string function which can take an array of words, or other similar function?
Ex.
SELECT REPLACE(my_column, ['blue', 'red'], ['ColorBlue', 'ColorRed']);
So blue becomes ColorBlue, and red becomes ColorRed?
It's not only such simple replacements, but for the example I'm using this.
One way is create it:
create or replace function rep_arr(str text, src text[], rep text[])
returns text as $$
begin
for i in 1..array_length(src, 1) loop
str := replace(str, src[i], rep[i]);
end loop;
return str;
end; $$ language plpgsql
Call:
select rep_arr('bla bla blue bla red bla', '{blue,red}' , '{ColorBlue,ColorRed}');
I agree with #OtoShavadze that you can write your own function.
Here is my solution:
I use generate_subscripts(array anyarray, dim int) function as suggested in Searching in Arrays documentation.
CREATE OR REPLACE FUNCTION translate(string text, from_array text[], to_array text[])
RETURNS text AS
$BODY$
DECLARE
output text;
BEGIN
SELECT INTO output
to_array[idx]
FROM
generate_subscripts(from_array, 1) AS idx
WHERE
from_array[idx] = string; -- here you can change the search condition
IF FOUND THEN
RETURN output;
ELSE
RETURN string;
END IF;
END;
$BODY$
LANGUAGE plpgsql VOLATILE;
It finds and replaces whole words but you can change it (line marked in the code) to find only a substring, match case-insensitive, etc...
You should also add parameter checking: arrays should not be null, multidimensional nor differ in size:
IF from_array IS NULL OR to_array IS NULL THEN
RAISE EXCEPTION 'NULL parameters';
END IF;
IF array_ndims(from_array) != 1 OR array_ndims(to_array) != 1 THEN
RAISE EXCEPTION 'Multidimensional parameters';
END IF;
IF array_length(from_array, 1) != array_length(to_array, 1) THEN
RAISE EXCEPTION 'Parameters size differ';
END IF;
SELECT translate('red', ARRAY['blue', 'red'], ARRAY['ColorBlue', 'ColorRed']);
returns
ColorRed

Generate unique random strings in plpgsql

I am trying to write a function to create unique random tokens of variable length. However, I am stumped by the plpgsql syntax. My intention is to create a function which
Takes a table and column as input
Generates a random string of a given length, with a given set of characters
Checks if the string is already in the colum
If so (and this is expected to be rare), simply generate a new random string.
Otherwise, return the random string
My current attempt looks like this:
CREATE FUNCTION random_token(_table TEXT, _column TEXT, _length INTEGER) RETURNS text AS $$
DECLARE
alphanum CONSTANT text := 'abcdefghijkmnopqrstuvwxyz23456789';
range_head CONSTANT integer := 25;
range_tail CONSTANT integer := 33;
random_string text;
BEGIN
REPEAT
SELECT substring(alphanum from trunc(random() * range_head + 1)::integer for 1) ||
array_to_string(array_agg(substring(alphanum from trunc(random() * range_tail + 1)::integer for 1)), '')
INTO random_string FROM generate_series(1, _length - 1);
UNTIL random_string NOT IN FORMAT('SELECT %I FROM %I WHERE %I = random_string;', _column, _table, _column)
END REPEAT;
RETURN random_string;
END
$$ LANGUAGE plpgsql;
However, this doesn't work, and gives me a not very helpful error:
DatabaseError: error 'ERROR: syntax error at or near "REPEAT"
I have tried a number of variations, but without knowing what the error in the syntax is I am stumped. Any idea how to fix this function?
There is no repeat statement in plpgsql. Use simple loop.
CREATE OR REPLACE FUNCTION random_token(_table TEXT, _column TEXT, _length INTEGER) RETURNS text AS $$
DECLARE
alphanum CONSTANT text := 'abcdefghijkmnopqrstuvwxyz23456789';
range_head CONSTANT integer := 25;
range_tail CONSTANT integer := 33;
random_string text;
ct int;
BEGIN
LOOP
SELECT substring(alphanum from trunc(random() * range_head + 1)::integer for 1) ||
array_to_string(array_agg(substring(alphanum from trunc(random() * range_tail + 1)::integer for 1)), '')
INTO random_string FROM generate_series(1, _length - 1);
EXECUTE FORMAT('SELECT count(*) FROM %I WHERE %I = %L', _table, _column, random_string) INTO ct;
EXIT WHEN ct = 0;
END LOOP;
RETURN random_string;
END
$$ LANGUAGE plpgsql;
Note, random_string should be a parameter to format().
Update. According to the accurate hint from Abelisto, this should be faster for a large table:
DECLARE
dup boolean;
...
EXECUTE FORMAT('SELECT EXISTS(SELECT 1 FROM %I WHERE %I = %L)', _table, _column, random_string) INTO dup;
EXIT WHEN NOT dup;
...
This is almost certainly not what you want. When you say, "checks if the string is already in the column" you're not referring to something that looks unique, you're referring to something that actually is UNIQUE.
Instead, I would point you over this answer I gave about UUIDs.

PostgreSQL password generator

I need some help with sql pass generator. I already have a function which returns 8 random characters, but I have to be sure, that there are lowercase and uppercase characters and numbers. Any advice? Here is my old function.
CREATE FUNCTION f_generate_password() RETURNS text AS $$
DECLARE
password text;
chars text;
BEGIN
password := '';
chars :=
'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789';
FOR i IN 1..8 LOOP
password := password || SUBSTRING(chars,
ceil(random()*LENGTH(chars))::integer, 1);
END LOOP;
return password;
END;
$$
LANGUAGE plpgsql;
Here is a solution for these with a same or similar problem :)
CREATE OR REPLACE FUNCTION f_generate_password()
RETURNS text AS
$BODY$
DECLARE
vPassword text;
chars text;
BEGIN
vPassword := '';
chars :=
'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789';
WHILE((select COALESCE(substring(vPassword from '.*[a-z]+.*'),'')) = '' OR (select COALESCE(substring(vPassword from '.*[A-Z]+.*'),'')) = '' OR (select COALESCE(substring(vPassword from '.*[0-9]+.*'),'')) = '') LOOP
vPassword := '';
FOR i IN 1..8 LOOP
vPassword := vPassword || SUBSTRING(chars, ceil(random()*LENGTH(chars))::integer, 1);
END LOOP;
END LOOP;
return vPassword;
END;
$BODY$
LANGUAGE plpgsql;
If you wonder about an algorithm... I don't know PostgreSQL syntax/dialect but you can e.g.:
1) pick 3 random positions (1 to 8) and put 3 random lower-case letters there
2) pick 3 random positions (from the remaining ones) and put 3 random upper-case letters there
3) put 2 random digits in the remaining two positions.

How to cut varchar/text before n'th occurence of delimiter? PostgreSQL

I have strings (saved in database as varchar) and I have to cut them just before n'th occurence of delimiter.
Example input:
String: 'My-Example-Awesome-String'
Delimiter: '-'
Occurence: 2
Output:
My-Example
I implemented this function for fast prototype:
CREATE OR REPLACE FUNCTION find_position_delimiter(fulltext varchar, delimiter varchar, occurence integer)
RETURNS varchar AS
$BODY$
DECLARE
result varchar = '';
arr text[] = regexp_split_to_array( fulltext, delimiter);
word text;
counter integer := 0;
BEGIN
FOREACH word IN ARRAY arr LOOP
EXIT WHEN ( counter = occurence );
IF (counter > 0) THEN result := result || delimiter;
END IF;
result := result || word;
counter := counter + 1;
END LOOP;
RETURN result;
END;
$BODY$
LANGUAGE 'plpgsql' IMMUTABLE;
SELECT find_position_delimiter('My-Example-Awesome-String', '-', 2);
For now it assumes that string is not empty (provided by query where I will call function) and delimiter string contains at least one delimiter of provided pattern.
But now I need something better for performance test. If it is possible, I would love to see the most universal solution, because not every user of my system is working on PostgreSQL database (few of them prefer Oracle, MySQL or SQLite), but it is not the most importatnt. But performance is - because on specific search, that function can be called even few hundreds times.
I didn't find anything about fast and easy using varchar as a table of chars and checking for occurences of delimiter (I could remember position of occurences and then create substring from first char to n'th delimiter position-1). Any ideas? Are smarter solutions?
# EDIT: yea, I know that function in every database will be a bit different, but body of function can be very similliar or the same. Generality is not a main goal :) And sorry for that bad function working-name, I just saw it has not right meaning.
you can try doing something based on this:
select
varcharColumnName,
INSTR(varcharColumnName,'-',1,2),
case when INSTR(varcharColumnName,'-',1,2) <> 0
THEN SUBSTR(varcharColumnName, 1, INSTR(varcharColumnName,'-',1,2) - 1)
else '...'
end
from tableName;
of course, you have to handle "else" the way you want. It works on postgres and oracle (tested), it should work on other dbms's because these are standard sql functions
//edit - as a function, however this way it's rather hard to make it cross-dbms
CREATE OR REPLACE FUNCTION find_position_delimiter(fulltext varchar, delimiter varchar, occurence integer)
RETURNS varchar as
$BODY$
DECLARE
result varchar := '';
delimiterPos integer := 0;
BEGIN
delimiterPos := INSTR(fulltext,delimiter,1,occurence);
result := SUBSTR(fulltext, 1, delimiterPos - 1);
RETURN result;
END;
$BODY$
LANGUAGE 'plpgsql' IMMUTABLE;
SELECT find_position_delimiter('My-Example-Awesome-String', '-', 2);
create or replace function trunc(string text, delimiter char, occurence int) returns text as $$
return delimiter.join(string.split(delimiter)[:occurence])
$$ language plpythonu;
# select trunc('My-Example-Awesome-String', '-', 2);
trunc
------------
My-Example
(1 row)

String manipulation in a SQL result set

Im currently studying sql statements and got curious of this specific exercise in the net.
the problem is
"How can I encrypt every letter in a row that contain two or more spaces?( with spaces not included)"
I have created a sample table here
sample
-----------------------
first
first second
first second third
first second third fourth
Here is what I would like to get:
sample
-----------------------
first
first second
first ****** *****
first ****** ***** fourth
and this is what I've tried so far:
select name, substring(name, E'(\\s\\w+\\s.*)') from sample ;
It is too complicated to do it in simple select so I used function with rich PostgreSQL string functions:
CREATE OR REPLACE FUNCTION hide_middle(s varchar)
RETURNS varchar AS
$BODY$
DECLARE
r varchar;
arr varchar[];
BEGIN
r := s;
arr := regexp_matches(s, '^(\\S+ )(.*)( \\S+)$');
IF array_length(arr, 1) = 3 THEN
r := arr[1] || regexp_replace(arr[2], '\\S', '*', 'g') || arr[3];
END IF;
RETURN r;
END;
$BODY$
LANGUAGE 'plpgsql' VOLATILE;
You can use it with:
select hide_middle('first second third fourth')