String manipulation in a SQL result set - sql

Im currently studying sql statements and got curious of this specific exercise in the net.
the problem is
"How can I encrypt every letter in a row that contain two or more spaces?( with spaces not included)"
I have created a sample table here
sample
-----------------------
first
first second
first second third
first second third fourth
Here is what I would like to get:
sample
-----------------------
first
first second
first ****** *****
first ****** ***** fourth
and this is what I've tried so far:
select name, substring(name, E'(\\s\\w+\\s.*)') from sample ;

It is too complicated to do it in simple select so I used function with rich PostgreSQL string functions:
CREATE OR REPLACE FUNCTION hide_middle(s varchar)
RETURNS varchar AS
$BODY$
DECLARE
r varchar;
arr varchar[];
BEGIN
r := s;
arr := regexp_matches(s, '^(\\S+ )(.*)( \\S+)$');
IF array_length(arr, 1) = 3 THEN
r := arr[1] || regexp_replace(arr[2], '\\S', '*', 'g') || arr[3];
END IF;
RETURN r;
END;
$BODY$
LANGUAGE 'plpgsql' VOLATILE;
You can use it with:
select hide_middle('first second third fourth')

Related

Scope of column names, aliases, and OUT parameters in PL/pgSQL function

I have a hard time understanding why I can refer to the output columns in returns table(col type).
There is a subtle bug in the below code, the order by var refers to res in returns, not to data1 which we aliased to res. res in where is always null and we get 0 rows.
Why can I refer to the column name in output?
In what cases do I want this?
CREATE OR REPLACE FUNCTION public.test(var INTEGER)
RETURNS table(res int )
LANGUAGE plpgsql
AS $function$
begin
return query
select data1 res
from table_with_data
where res < var;
end
$function$
Why can I refer to the column name in output
From the manual, the section about function parameters:
column_name The name of an output column in the RETURNS TABLE syntax. This is effectively another way of declaring a named OUT parameter, except that RETURNS TABLE also implies RETURNS SETOF.
What this means is that in your case res is effectively a writeable variable, which type you plan to return a set of. As any other variable without a default value assigned, it starts off as null.
In what case do I want this
You can return multiple records from a function of this type with a single return query, but another way is by a series of multiple return query or return next - in the second case, filling out the fields in a record of your output table each time. You could have expected a return statement to end the function, but in this scenario only a single return; without anything added would have that effect.
create table public.test_res (data integer);
CREATE OR REPLACE FUNCTION public.test(var INTEGER)
RETURNS table(res int )
LANGUAGE plpgsql
AS $function$
begin
insert into public.test_res select res;--to inspect its initial value later
select 1 into res;
return next;
return next;--note that res isn't reset after returning next
return query select 2;--doesn't affect the current value of res
return next;--returning something else earlier didn't affect res either
return;--it will finish here
select 3 into res;
return next;
end
$function$;
select * from test(0);
-- res
-------
-- 1
-- 1
-- 2
-- 1
--(4 rows)
table public.test_res; --this was the initial value of res within the function
-- data
--------
-- null
--(1 row)
Which is the most useful with LOOPs
CREATE OR REPLACE FUNCTION public.test(var INTEGER)
RETURNS table(comment text,res int) LANGUAGE plpgsql AS $function$
declare rec record;
array_slice int[];
begin
return query select 'return query returned these multiple records in one go', a from generate_series(1,3,1) a(a);
res:=0;
comment:='loop exit when res>4';
loop exit when res>4;
select res+1 into res;
return next;
end loop;
comment:='while res between 5 and 8 loop';
while res between 5 and 8 loop
select res+2 into res;
return next;
end loop;
comment:='for element in reverse 3 .. -3 by 2 loop';
for element in reverse 3 .. -3 by 2 loop
select element into res;
return next;
end loop;
comment:='for <record> in <expression> loop';
for rec in select pid from pg_stat_activity where state<>'idle' loop
select rec.pid into res;
return next;
end loop;
comment:='foreach array_slice slice 1 in array arr loop';
foreach array_slice SLICE 1 in array ARRAY[[1,2,3],[11,12,13],[21,22,23]] loop
select array_slice[1] into res;
return next;
end loop;
end
$function$;
Example results
select * from public.test(0);
-- comment | res
----------------------------------------------------------+--------
-- return query returned these multiple records in one go | 1
-- return query returned these multiple records in one go | 2
-- return query returned these multiple records in one go | 3
-- loop exit when res>4 | 1
-- loop exit when res>4 | 2
-- loop exit when res>4 | 3
-- loop exit when res>4 | 4
-- loop exit when res>4 | 5
-- while res between 5 and 8 loop | 7
-- while res between 5 and 8 loop | 9
-- for element in reverse 3 .. -3 by 2 loop | 3
-- for element in reverse 3 .. -3 by 2 loop | 1
-- for element in reverse 3 .. -3 by 2 loop | -1
-- for element in reverse 3 .. -3 by 2 loop | -3
-- for <record> in <expression> loop | 118786
-- foreach array_slice slice 1 in array arr loop | 1
-- foreach array_slice slice 1 in array arr loop | 11
-- foreach array_slice slice 1 in array arr loop | 21
--(18 rows)
True, OUT parameters (including field names in a RETURNS TABLE (...) clause) are visible in all SQL DML statements in a PL/pgSQL function body, just like other variables. Find details in the manual chapters Variable Substitution and Returning from a Function for PL/pgSQL.
However, a more fundamental misunderstanding comes first here. The syntax of your nested SELECT is invalid to begin with. The PL/pgSQL variable happens to mask this problem (with a different problem). In SQL, you cannot refer to output column names (column aliases in the SELECT clause) in the WHERE clause. This is invalid:
select data1 res
from table_with_data
where res < var;
The manual:
An output column's name can be used to refer to the column's value in
ORDER BY and GROUP BY clauses, but not in the WHERE or HAVING clauses;
there you must write out the expression instead.
This is different for ORDER BY, which you mention in the text, but don't include in the query. See:
GROUP BY + CASE statement
Fixing immediate issue
Could be repaired like this:
CREATE OR REPLACE FUNCTION public.test1(var int)
RETURNS TABLE(res int)
LANGUAGE plpgsql AS
$func$
BEGIN
RETURN QUERY
SELECT data1 AS res -- column alias is just noise (or documentation)
FROM table_with_data
WHERE data1 < var; -- original column name!
END
$func$
fiddle
See:
Real number comparison for trigram similarity
The column alias is just noise in this case. The name of the column returned from the function is res in any case - as defined in the RETURNS TABLE clause.
Aside: It's recommended not to omit the AS keyword for column aliases (unlike table aliases). See:
Query to ORDER BY the number of rows returned from another SELECT
If there was actual ambiguity between column and variable name - say, you declared an OUT parameter or variable named data1 - you'd get an error message like this:
ERROR: column reference "data1" is ambiguous
LINE 2: select data1
^
DETAIL: It could refer to either a PL/pgSQL variable or a table column.
Brute force fix
Could be fixed with a special command at the start of the function body:
CREATE OR REPLACE FUNCTION public.test3(var int)
RETURNS TABLE(data1 int)
LANGUAGE plpgsql AS
$func$
#variable_conflict use_column -- ! to resolve conflicts
BEGIN
RETURN QUERY
SELECT data1
FROM table_with_data
WHERE data1 < var; -- !
END
$func$
See:
Naming conflict between function parameter and result of JOIN with USING clause
Proper fix
Table-qualify column names, and avoid conflicting variable names to begin with.
CREATE OR REPLACE FUNCTION public.test4(_var int)
RETURNS TABLE(res int)
LANGUAGE plpgsql STABLE AS
$func$
BEGIN
RETURN QUERY
SELECT t.data1 -- table-qualify column name
FROM table_with_data t
WHERE t.data1 < _var; -- !
END
$func$
Example:
Calling a PostgreSQL function from Java

Is it possible to perform multiple replacements in one function call?

If you want to replace multiple strings in one go, you can of course nest the REPLACE function, eg. like this:
SELECT REPLACE(REPLACE(REPLACE(foo, 'apple', 'fruit'), 'banana', 'fruit'), 'lettuce', 'vegetable')
FROM bar
If you have to do a lot of replacing, your code will become ugly and hard to read. Is there such a thing as a multi-replace function? Which would maybe take 2 arrays as arguments? To be sure, I'm familiar with the TRANSLATE function, but as my example indicates I want to replace whole words, not just single characters.
I would implement such a function in the following way:
CREATE OR REPLACE FUNCTION multi_replace(_string text, variadic _param_args text[])
RETURNS TEXT
AS
$BODY$
DECLARE
_index integer;
BEGIN
FOR _index IN 1 .. cardinality(_param_args) - 1 by 2 loop
_string := replace(_string, _param_args[_index], _param_args[_index+1]);
end loop;
RETURN _string;
END;
$BODY$
LANGUAGE plpgsql VOLATILE;
cardinality() returns the length of the parameter array, and by 2 increases the loop index by 2 for every iteration so that it's safe to use _index and _index + 1 inside the loop to access the pairs that belong together.
Online example: https://rextester.com/LITG61720
I was looking for cleanliness, not performance, so I went with your suggestion 404.
This is the function I wrote:
CREATE OR REPLACE FUNCTION multi_replace(
_string TEXT,
VARIADIC _param_args TEXT[]
)
RETURNS TEXT AS
$BODY$
DECLARE
_old RECORD;
_new TEXT;
BEGIN
FOR _old in (
SELECT
_param_args[i] AS tekst,
i AS indexnummer
FROM generate_series(1, 5, 2) i
)
LOOP
_new := COALESCE(_param_args[_old.indexnummer+1], '');
_string := REPLACE(_string, _old.tekst, _new);
END LOOP;
RETURN _string;
END;
$BODY$
LANGUAGE plpgsql VOLATILE
And this is a sample call:
select multi_replace('I love to drink milk in the morning', 'milk', 'beer', 'morning', 'evening', 'I', 'we');
Odd that this isn't a standard function, maybe a good idea for the next release? Thanks everybody for the suggestions!

How get all matching positions in a string?

I have a column flag_acumu in a table in PostgreSQL with values like:
'SSNSSNNNNNNNNNNNNNNNNNNNNNNNNNNNNSNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN'
I need to show all positions with an 'S'. With this code, I only get the first such position, but not the later ones.
SELECT codn_conce, flag_acumu, position('S' IN flag_acumu) AS the_pos
FROM dh12
WHERE position('S' IN flag_acumu) != 0
ORDER BY the_pos ASC;
How to get all of them?
In Postgres 9.4 or later you can conveniently use unnest() in combination with WITH ORDINALITY:
SELECT *
FROM dh12 d
JOIN unnest(string_to_array(d.flag_acumu, NULL))
WITH ORDINALITY u(elem, the_pos) ON u.elem = 'S'
WHERE d.flag_acumu LIKE '%S%' -- optional, see below
ORDER BY d.codn_conce, u.the_pos;
This returns one row per match.
WHERE d.flag_acumu LIKE '%S%' is optional to quickly eliminate source rows without any matches. Pays if there are more than a few such rows.
Detailed explanation and alternatives for older versions:
PostgreSQL unnest() with element number
Since you didn't specify your needs to a point in which one could answer properly, I'm going with my assumption that you want a list of positions of occurence of a substring (can be more than 1 character long).
Here's the function to do that using:
FOR .. LOOP control structure,
function substr(text, int, int).
CREATE OR REPLACE FUNCTION get_all_positions_of_substring(text, text)
RETURNS text
STABLE
STRICT
LANGUAGE plpgsql
AS $$
DECLARE
output_text TEXT := '';
BEGIN
FOR i IN 1..length($1)
LOOP
IF substr($1, i, length($2)) = $2 THEN
output_text := CONCAT(output_text, ';', i);
END IF;
END LOOP;
-- Remove first semicolon
output_text := substr(output_text, 2, length(output_text));
RETURN output_text;
END;
$$;
Sample call and output
postgres=# select * from get_all_positions_of_substring('soklesocmxsoso','so');
get_all_positions_of_substring
--------------------------------
1;6;11;13
This works too. And a bit faster I think.
create or replace function findAllposition(_pat varchar, _tar varchar)
returns int[] as
$body$
declare _poslist int[]; _pos int;
begin
_pos := position(_pat in _tar);
while (_pos>0)
loop
if array_length(_poslist,1) is null then
_poslist := _poslist || (_pos);
else
_poslist := _poslist || (_pos + _poslist[array_length(_poslist,1)] + 1);
end if;
_tar := substr(_tar, _pos + 1, length(_tar));
_pos := position(_pat in _tar);
end loop;
return _poslist;
end;
$body$
language plpgsql;
Will return a position list which is an int array.
{position1, position2, position3, etc.}

How to cut varchar/text before n'th occurence of delimiter? PostgreSQL

I have strings (saved in database as varchar) and I have to cut them just before n'th occurence of delimiter.
Example input:
String: 'My-Example-Awesome-String'
Delimiter: '-'
Occurence: 2
Output:
My-Example
I implemented this function for fast prototype:
CREATE OR REPLACE FUNCTION find_position_delimiter(fulltext varchar, delimiter varchar, occurence integer)
RETURNS varchar AS
$BODY$
DECLARE
result varchar = '';
arr text[] = regexp_split_to_array( fulltext, delimiter);
word text;
counter integer := 0;
BEGIN
FOREACH word IN ARRAY arr LOOP
EXIT WHEN ( counter = occurence );
IF (counter > 0) THEN result := result || delimiter;
END IF;
result := result || word;
counter := counter + 1;
END LOOP;
RETURN result;
END;
$BODY$
LANGUAGE 'plpgsql' IMMUTABLE;
SELECT find_position_delimiter('My-Example-Awesome-String', '-', 2);
For now it assumes that string is not empty (provided by query where I will call function) and delimiter string contains at least one delimiter of provided pattern.
But now I need something better for performance test. If it is possible, I would love to see the most universal solution, because not every user of my system is working on PostgreSQL database (few of them prefer Oracle, MySQL or SQLite), but it is not the most importatnt. But performance is - because on specific search, that function can be called even few hundreds times.
I didn't find anything about fast and easy using varchar as a table of chars and checking for occurences of delimiter (I could remember position of occurences and then create substring from first char to n'th delimiter position-1). Any ideas? Are smarter solutions?
# EDIT: yea, I know that function in every database will be a bit different, but body of function can be very similliar or the same. Generality is not a main goal :) And sorry for that bad function working-name, I just saw it has not right meaning.
you can try doing something based on this:
select
varcharColumnName,
INSTR(varcharColumnName,'-',1,2),
case when INSTR(varcharColumnName,'-',1,2) <> 0
THEN SUBSTR(varcharColumnName, 1, INSTR(varcharColumnName,'-',1,2) - 1)
else '...'
end
from tableName;
of course, you have to handle "else" the way you want. It works on postgres and oracle (tested), it should work on other dbms's because these are standard sql functions
//edit - as a function, however this way it's rather hard to make it cross-dbms
CREATE OR REPLACE FUNCTION find_position_delimiter(fulltext varchar, delimiter varchar, occurence integer)
RETURNS varchar as
$BODY$
DECLARE
result varchar := '';
delimiterPos integer := 0;
BEGIN
delimiterPos := INSTR(fulltext,delimiter,1,occurence);
result := SUBSTR(fulltext, 1, delimiterPos - 1);
RETURN result;
END;
$BODY$
LANGUAGE 'plpgsql' IMMUTABLE;
SELECT find_position_delimiter('My-Example-Awesome-String', '-', 2);
create or replace function trunc(string text, delimiter char, occurence int) returns text as $$
return delimiter.join(string.split(delimiter)[:occurence])
$$ language plpythonu;
# select trunc('My-Example-Awesome-String', '-', 2);
trunc
------------
My-Example
(1 row)

How can I retrieve a column value of a specific row

I'm using PostgreSQL 9.3.
The table partner.partner_statistic contains the following columns:
id reg_count
serial integer
I wrote the function convert(integer):
CREATE FUNCTION convert(d integer) RETURNS integer AS $$
BEGIN
--Do something and return integer result
END
$$ LANGUAGE plpgsql;
And now I need to write a function returned array of integers as follows:
CREATE FUNCTION res() RETURNS integer[] AS $$
<< outerblock >>
DECLARE
arr integer[]; --That array of integers I need to fill in depends on the result of query
r partner.partner_statistic%rowtype;
table_name varchar DEFAULT 'partner.partner_statistic';
BEGIN
FOR r IN
SELECT * FROM partner.partner_statistic offset 0 limit 100
LOOP
--
-- I need to add convert(r[reg_count]) to arr where r[id] = 0 (mod 5)
--
-- How can I do that?
END LOOP;
RETURN;
END;
$$ LANGUAGE plpgsql;
You don't need (and shouldn't use) PL/PgSQL loops for this. Just use an aggregate. I'm kind of guessing about what you mean by "where r[id] = 0 (mod 5) but I'm assuming you mean "where id is evenly divisible by 5". (Note that this is NOT the same thing as "every fifth row" because generated IDs have gaps).
Something like:
SELECT array_agg(r.reg_count)
FROM partner.partner_statistic
WHERE id % 5 = 0
LIMIT 100
probably meets your needs.
If you want to return the value, use RETURN QUERY SELECT ... or preferably use a simple sql language function.
If you want a dynamic table name, use:
RETURN QUERY EXECUTE format('
SELECT array_agg(r.reg_count)
FROM %I
WHERE id % 5 = 0
LIMIT 100', table_name::regclass);