How to include quote in plpgsql function - sql

The following function identifies columns with null values. How can I extend the where clause to check null or empty value?
coalesce(TRIM(string), '') = ''
CREATE OR REPLACE FUNCTION public.is_column_empty(IN table_name varchar, IN column_name varchar)
RETURNS bool
LANGUAGE plpgsql
AS $function$
declare
count integer;
BEGIN
execute FORMAT('SELECT COUNT(*) from %s WHERE %s IS NOT NULL', table_name, quote_ident(column_name)) into count;
RETURN (count = 0);
END;
$function$
;

There are more possibilities - for example you can use custom string separators:
CREATE OR REPLACE FUNCTION public.is_column_empty(IN table_name varchar,
IN column_name varchar)
RETURNS bool
LANGUAGE plpgsql
AS $function$
DECLARE _found boolean; /* attention "count" is keyword */
BEGIN
EXECUTE format($_$SELECT EXISTS(SELECT * FROM %I WHERE COALESCE(trim(%I), '') <> '')$_$,
table_name, column_name)
INTO _found;
RETURN NOT _found;
END;
$function$;
your example has more issues:
don't use count where you really need to know number of rows (items). This can be pretty slow on bigger tables
Usually for keywords are used uppercase chars
don't use variable names that are SQL, PL/pgSQL keywords (reserved or unreserved), there can be some problems in some contexts (count, user, ...)
this is classic example of some chaos in data - you should to disallow empty strings in data. Then you can use index and the predicate COLNAME IS NOT NULL. It can be pretty fast.

You need to double up the quotation marks, like this:
CREATE OR REPLACE FUNCTION public.is_column_empty(IN table_name varchar, IN column_name varchar)
RETURNS bool
LANGUAGE plpgsql
AS $function$
declare
count integer;
BEGIN
execute FORMAT('SELECT COUNT(*) from %s WHERE COALESCE(TRIM(%s),'''') <> ''''', table_name, quote_ident(column_name)) into count;
RETURN (count = 0);
END;
$function$
;
EDIT:
Re-reading your question, I was a little unsure that you are getting what you want. As it stands the function returns false if at least one row has a value in the given column, even if all the other rows are empty. Is this really what you want, or are you rather looking for columns where any row has this column empty?

Related

Iterate through column names to get counts in a PL/pgSQL function

I have a table in my Postgres database that I'm trying to determine fill rates for (that is, I'm trying to understand how often data is/isn't missing). I need to make a function that, for each column (in a list of a couple dozen columns I've selected), counts the number and percentage of columns with non-null values.
The problem is, I don't really know how to iterate through a list of columns in a programmatic way, because I don't know how to reference a column from a string of its name. I've read about how you can use the EXECUTE command to run dynamically-written SQL, but I haven't been able to get it to work. Here's my current function:
CREATE OR REPLACE FUNCTION get_fill_rates() RETURNS TABLE (field_name text, fill_count integer, fill_percentage float) AS $$
DECLARE
fields text[] := array['column_a', 'column_b', 'column_c'];
total_rows integer;
BEGIN
SELECT reltuples INTO total_rows FROM pg_class WHERE relname = 'my_table';
FOR i IN array_lower(fields, 1) .. array_upper(fields, 1)
LOOP
field_name := fields[i];
EXECUTE 'SELECT COUNT(*) FROM my_table WHERE $1 IS NOT NULL' INTO fill_count USING field_name;
fill_percentage := fill_count::float / total_rows::float;
RETURN NEXT;
END LOOP;
END;
$$ LANGUAGE plpgsql;
SELECT * FROM get_fill_rates() ORDER BY fill_count DESC;
This function, as written, returns every field as having a 100% fill rate, which I know to be false. How can I make this function work?
I know you already solved it. But let me suggest you to avoid concatenating identifiers on dynamic queries, you can use format with a identifier wildcard instead:
CREATE OR REPLACE FUNCTION get_fill_rates() RETURNS TABLE (field_name text, fill_count integer, fill_percentage float) AS $$
DECLARE
fields text[] := array['column_a', 'column_b', 'column_c'];
table_name name := 'my_table';
total_rows integer;
BEGIN
SELECT reltuples INTO total_rows FROM pg_class WHERE relname = table_name;
FOREACH field_name IN ARRAY fields
LOOP
EXECUTE format('SELECT COUNT(*) FROM %I WHERE %I IS NOT NULL', table_name, field_name) INTO fill_count;
fill_percentage := fill_count::float / total_rows::float;
RETURN NEXT;
END LOOP;
END;
$$ LANGUAGE plpgsql;
Doing this way will help you preventing SQL-injection attacks and will reduce query parse overhead a bit. More info here.
I figured out the solution after I wrote my question but before I submitted it -- since I've already done the work of writing the question, I'll just go ahead and share the answer. The problem was in my EXECUTE statement, specifically with that USING field_name bit. I think it was getting treated as a string literal when I did it that way, which meant the query was evaluating if "a string literal" IS NOT NULL which of course, is always true.
Instead of parameterizing the column name, I need to inject it directly into the query string. So, I changed my EXECUTE line to the following:
EXECUTE 'SELECT COUNT(*) FROM my_table WHERE ' || field_name || ' IS NOT NULL' INTO fill_count;
Some problems in the code aside (see below), this can be substantially faster and simpler with a single scan over the table in a plain query:
SELECT v.*
FROM (
SELECT count(column_a) AS ct_column_a
, count(column_b) AS ct_column_b
, count(column_c) AS ct_column_c
, count(*)::numeric AS ct
FROM my_table
) sub
, LATERAL (
VALUES
(text 'column_a', ct_column_a, round(ct_column_a / ct, 3))
, (text 'column_b', ct_column_b, round(ct_column_b / ct, 3))
, (text 'column_c', ct_column_c, round(ct_column_c / ct, 3))
) v(field_name, fill_count, fill_percentage);
The crucial "trick" here is that count() only counts non-null values to begin with, no tricks required.
I rounded the percentage to 3 decimal digits, which is optional. For this I cast to numeric.
Use a VALUES expression to unpivot the results and get one row per field.
For repeated use or if you have a long list of columns to process, you can generate and execute the query dynamically. But, again, don't run a separate count for each column. Just build above query dynamically:
CREATE OR REPLACE FUNCTION get_fill_rates(tbl regclass, fields text[])
RETURNS TABLE (field_name text, fill_count bigint, fill_percentage numeric) AS
$func$
BEGIN
RETURN QUERY EXECUTE (
-- RAISE NOTICE '%', ( -- to debug if needed
SELECT
'SELECT v.*
FROM (
SELECT count(*)::numeric AS ct
, ' || string_agg(format('count(%I) AS %I', fld, 'ct_' || fld), ', ') || '
FROM ' || tbl || '
) sub
, LATERAL (
VALUES
(text ' || string_agg(format('%L, %2$I, round(%2$I/ ct, 3))', fld, 'ct_' || fld), ', (') || '
) v(field_name, fill_count, fill_pct)
ORDER BY v.fill_count DESC'
FROM unnest(fields) fld
);
END
$func$ LANGUAGE plpgsql;
Call:
SELECT * FROM get_fill_rates('my_table', '{column_a, column_b, column_c}');
As you can see, this works for any given table and column list now.
And all identifiers are properly quoted automatically, using format() or by the built-in virtues of the regclass type.
Related:
Table name as a PostgreSQL function parameter
How to unpivot a table in PostgreSQL
Query for crosstab view
Convert one row into multiple rows with fewer columns
Your original query could be improved like this, but this is just lipstick on a pig. Do not use this inefficient approach.
CREATE OR REPLACE FUNCTION get_fill_rates()
RETURNS TABLE (field_name text, fill_count bigint, fill_percentage float) AS
$$
DECLARE
fields text[] := '{column_a, column_b, column_c}'; -- must be legal identifiers!
total_rows float; -- use float right away
BEGIN
SELECT reltuples INTO total_rows FROM pg_class WHERE relname = 'my_table';
FOREACH field_name IN ARRAY fields -- use FOREACH
LOOP
EXECUTE 'SELECT COUNT(*) FROM big WHERE ' || field_name || ' IS NOT NULL'
INTO fill_count;
fill_percentage := fill_count / total_rows; -- already type float
RETURN NEXT;
END LOOP;
END
$$ LANGUAGE plpgsql;
Plus, pg_class.reltuples is only an estimate. Since you are counting anyway, use an actual count.
Related:
Iterating over integer[] in PL/pgSQL
Fast way to discover the row count of a table in PostgreSQL

Postgres dynamically select all text columns - subquery?

I want a select that returns all fields in a table that are of type = "character varying". It needs to run across multiple tables, so needs to be dynamic.
I was trying to use a subquery to first get the text columns, and then run the query:
SELECT (SELECT STRING_AGG(QUOTE_IDENT(column_name), ', ') FROM
information_schema.columns WHERE table_name = foo
AND data_type = 'character varying') FROM foo;
But that's not working, I just get a list of column names but not the values. Does anyone know how I can make it work or a better way to do it?
Thank you,
Ben
You need Pl/PgSQL for this, as PostgreSQL doesn't support dynamic SQL in its plain SQL dialect.
CREATE OR REPLACE FUNCTION get_cols(target_table text) RETURNS SETOF record AS $$
DECLARE
cols text;
BEGIN
cols := (SELECT STRING_AGG(QUOTE_IDENT(column_name), ', ')
FROM information_schema.columns
WHERE table_name = target_table
AND data_type = 'character varying');
RETURN QUERY EXECUTE 'SELECT '||cols||' FROM '||quote_ident(target_table)||';';
END;
$$
LANGUAGE plpgsql;
However, you'll find this hard to call, as you need to know the result column list to be able to call it. That kind of defeats the point. You'll need to massage the result into a concrete type. I convert to hstore here, but you could return json or an array or whatever, really:
CREATE OR REPLACE FUNCTION get_cols(target_table text) RETURNS SETOF hstore AS $$
DECLARE
cols text;
BEGIN
cols := (SELECT STRING_AGG(QUOTE_IDENT(column_name), ', ')
FROM information_schema.columns
WHERE table_name = target_table
AND data_type = 'character varying');
RETURN QUERY EXECUTE 'SELECT hstore(ROW('||cols||')) FROM '||quote_ident(target_table)||';';
END;
$$
LANGUAGE plpgsql;
Dynamic SQL is a pain, consider doing this at the application level.

Returning results from a function in 'select statement' format

I have a function that looks like this:
CREATE OR REPLACE FUNCTION mffcu.test_ty_hey()
RETURNS setof record
LANGUAGE plpgsql
AS $function$
Declare
cname1 text;
sql2 text;
Begin
for cname1 in
select array_to_string(useme, ', ') from (
select array_agg(column_name) as useme
from(
select column_name::text
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = 'crosstab_183'
and ordinal_position != 1
) as fin
) as fine
loop
sql2 := 'select distinct array['|| cname1 ||'] from mffcu.crosstab_183';
execute sql2;
end loop;
END;
$function$
I call the function with this:
select mffcu.test_ty_hey()
How do I return the results of the sql2 query without creating a table/temporary table?
While #Pavel is right, of course, your very convoluted function could be untangled to:
CREATE OR REPLACE FUNCTION mffcu.test_ty_hey()
RETURNS SETOF text[] LANGUAGE plpgsql
AS $func$
DECLARE
cname1 text;
BEGIN
FOR cname1 IN
SELECT column_name::text
FROM information_schema.columns
WHERE table_name = 'crosstab_183'
AND table_schema = 'mffcu'
AND ordinal_position <> 1
LOOP
RETURN QUERY
EXECUTE format('SELECT DISTINCT ARRAY[%I::text]
FROM mffcu.crosstab_183', cname1);
END LOOP;
END
$func$
format() requires PostgreSQL 9.1 or later. In 9.0 you can substitute with:
EXECUTE 'SELECT DISTINCT ARRAY['|| quote_ident(cname1) ||'::text]
FROM mffcu.crosstab_183';
Call:
select * FROM mffcu.test_ty_hey();
By casting each column to text we arrive at a consistent data type that can be used to declare the RETURN type. This compromise has to be made to return various data types from one function. Every data type can be cast to text, so that's the obvious common ground.
BTW, I have trouble imagining what the ARRAY wrapper around every single value should be good for. I suppose you could just drop that.
PostgreSQL functions should to have fixed result type before execution. You cannot specify type late in execution. There is only two workarounds - using temp tables or using cursors.
PLpgSQL language is not good for too generic routines - it good for implementation strict and clean business rules. And bad for generic crosstab calculations or generic auditing or similar generic task. It works, but code is slower and usually not well maintainable.
but reply for your query, you can use a output cursors
example http://okbob.blogspot.cz/2008/08/using-cursors-for-generating-cross.html

plpgsql function issue

I have the following plpgsql procedure;
DECLARE
_r record;
point varchar[] := '{}';
i int := 0;
BEGIN
FOR _r IN EXECUTE ' SELECT a.'|| quote_ident(column) || ' AS point,
FROM ' || quote_ident (table) ||' AS a'
LOOP
point[i] = _r;
i = i+1;
END LOOP;
RETURN 'OK';
END;
Which its main objective is to traverse a table and store each value of the row in an array. I am still new to plpgsql. Can anyone point out is the error as it is giving me the following error;
This is the complete syntax (note that I renamed the parameter column to col_name as column is reserved word. The same goes for table)
create or replace function foo(col_name text, table_name text)
returns text
as
$body$
DECLARE
_r record;
point character varying[] := '{}';
i int := 0;
BEGIN
FOR _r IN EXECUTE 'SELECT a.'|| quote_ident(col_name) || ' AS pt, FROM ' || quote_ident (table_name) ||' AS a'
loop
point[i] = _r;
i = i+1;
END LOOP;
RETURN 'OK';
END;
$body$
language plpgsql;
Although to be honest: I fail so see what you are trying to achieve here.
#a_horse fixes most of the crippling problems with your failed attempt.
However, nobody should use this. The following step-by-step instructions should lead to a sane implementation with modern PostgreSQL.
Phase 1: Remove errors and mischief
Remove the comma after the SELECT list to fix the syntax error.
You start your array with 0, while the default is to start with 1. Only do this if you need to do it. Leads to unexpected results if you operate with array_upper() et al. Start with 1 instead.
Change RETURN type to varchar[] to return the assembled array and make this demo useful.
What we have so far:
CREATE OR REPLACE FUNCTION foo(tbl varchar, col varchar)
RETURNS varchar[] LANGUAGE plpgsql AS
$BODY$
DECLARE
_r record;
points varchar[] := '{}';
i int := 0;
BEGIN
FOR _r IN
EXECUTE 'SELECT a.'|| quote_ident(col) || ' AS pt
FROM ' || quote_ident (tbl) ||' AS a'
LOOP
i = i + 1; -- reversed order to make array start with 1
points[i] = _r;
END LOOP;
RETURN points;
END;
$BODY$;
Phase 2: Remove cruft, make it useful
Use text instead of character varying / varchar for simplicity. Either works, though.
You are selecting a single column, but use a variable of type record. This way a whole record is being coerced to text, which includes surrounding parenthesis. Hardly makes any sense. Use a text variable instead. Works for any column if you explicitly cast to text (::text). Any type can be cast to text.
There is no point in initializing the variable point. It can start as NULL here.
Table and column aliases inside EXECUTE are of no use in this case. Dynamically executed SQL has its own scope!.
No semicolon (;) needed after final END in a plpgsql function.
It's simpler to just append each value to the array with || .
Almost sane:
CREATE OR REPLACE FUNCTION foo1(tbl text, col text)
RETURNS text[] LANGUAGE plpgsql AS
$func$
DECLARE
point text;
points text[];
BEGIN
FOR point IN
EXECUTE 'SELECT '|| quote_ident(col) || '::text FROM ' || quote_ident(tbl)
LOOP
points = points || point;
END LOOP;
RETURN points;
END
$func$;
Phase 3: Make it shine in modern PL/pgSQL
If you pass a table name as text, you create an ambiguous situation. You can prevent SQLi just fine with format() or quote_ident(), but this will fail with tables outside your search_path.
Then you need to add schema-qualification, which creates an ambiguous value. 'x.y' could stand for the table name "x.y" or the schema-qualified table name "x"."y". You can't pass "x"."y" since that will be escaped into """x"".""y""". You'd need to either use an additional parameter for the schema name or one parameter of type regclass regclass is automatically quoted as need when coerced to text and is the elegant solution here.
The new format() is simpler than multiple (or even a single) quote_ident() call.
You did not specify any order. SELECT returns rows in arbitrary order without ORDER BY. This may seem stable, since the result is generally reproducible as long as the underlying table doesn't change. But that's 100% unreliable. You probably want to add some kind of ORDER BY.
Finally, you don't need to loop at all. Use a plain SELECT with an Array constructor.
Use an OUT parameter to further simplify the code
Proper solution:
CREATE OR REPLACE FUNCTION f_arr(tbl regclass, col text, OUT arr text[])
LANGUAGE plpgsql AS
$func$
BEGIN
EXECUTE format('SELECT ARRAY(SELECT %I::text FROM %s ORDER BY 1)', col, tbl)
INTO arr;
END
$func$;
Call:
SELECT f_arr('myschema.mytbl', 'mycol');

Can I have a postgres plpgsql function return variable-column records?

I want to create a postgres function that builds the set of columns it
returns on-the-fly; in short, it should take in a list of keys, build
one column per-key, and return a record consisting of whatever that set
of columns was. Briefly, here's the code:
CREATE OR REPLACE FUNCTION reports.get_activities_for_report() RETURNS int[] AS $F$
BEGIN
RETURN ARRAY(SELECT activity_id FROM public.activity WHERE activity_id NOT IN (1, 2));
END;
$F$
LANGUAGE plpgsql
STABLE;
CREATE OR REPLACE FUNCTION reports.get_amount_of_time_query(format TEXT, _activity_id INTEGER) RETURNS TEXT AS $F$
DECLARE
_label TEXT;
BEGIN
SELECT label INTO _label FROM public.activity WHERE activity_id = _activity_id;
IF _label IS NOT NULL THEN
IF lower(format) = 'percentage' THEN
RETURN $$TO_CHAR(100.0 *$$ ||
$$ (SUM(CASE WHEN activity_id = $$ || _activity_id || $$ THEN EXTRACT(EPOCH FROM ended - started) END) /$$ ||
$$ SUM(EXTRACT(EPOCH FROM ended - started))),$$ ||
$$ '990.99 %') AS $$ || quote_ident(_label);
ELSE
RETURN $$SUM(CASE WHEN activity_id = $$ || _activity_id || $$ THEN ended - started END)$$ ||
$$ AS $$ || quote_ident(_label);
END IF;
END IF;
END;
$F$
LANGUAGE plpgsql
STABLE;
CREATE OR REPLACE FUNCTION reports.build_activity_query(format TEXT, activities int[]) RETURNS TEXT AS $F$
DECLARE
_activity_id INT;
query TEXT;
_activity_count INT;
BEGIN
_activity_count := array_upper(activities, 1);
query := $$SELECT agent_id, portal_user_id, SUM(ended - started) AS total$$;
FOR i IN 1.._activity_count LOOP
_activity_id := activities[i];
query := query || ', ' || reports.get_amount_of_time_query(format, _activity_id);
END LOOP;
query := query || $$ FROM public.activity_log_final$$ ||
$$ LEFT JOIN agent USING (agent_id)$$ ||
$$ WHERE started::DATE BETWEEN actual_start_date AND actual_end_date$$ ||
$$ GROUP BY agent_id, portal_user_id$$ ||
$$ ORDER BY agent_id$$;
RETURN query;
END;
$F$
LANGUAGE plpgsql
STABLE;
CREATE OR REPLACE FUNCTION reports.get_agent_activity_breakdown(format TEXT, start_date DATE, end_date DATE) RETURNS SETOF RECORD AS $F$
DECLARE
actual_end_date DATE;
actual_start_date DATE;
query TEXT;
_rec RECORD;
BEGIN
actual_start_date := COALESCE(start_date, '1970-01-01'::DATE);
actual_end_date := COALESCE(end_date, now()::DATE);
query := reports.build_activity_query(format, reports.get_activities_for_report());
FOR _rec IN EXECUTE query LOOP
RETURN NEXT _rec;
END LOOP;
END
$F$
LANGUAGE plpgsql;
This builds queries that look (roughly) like this:
SELECT agent_id,
portal_user_id,
SUM(ended - started) AS total,
SUM(CASE WHEN activity_id = 3 THEN ended - started END) AS "Label 1"
SUM(CASE WHEN activity_id = 4 THEN ended - started END) AS "Label 2"
FROM public.activity_log_final
LEFT JOIN agent USING (agent_id)
WHERE started::DATE BETWEEN actual_start_date AND actual_end_date
GROUP BY agent_id, portal_user_id
ORDER BY agent_id
When I try to call the get_agent_activity_breakdown() function, I get this error:
psql:2009-10-22_agent_activity_report_test.sql:179: ERROR: a column definition list is required for functions returning "record"
CONTEXT: SQL statement "SELECT * FROM reports.get_agent_activity_breakdown('percentage', NULL, NULL)"
PL/pgSQL function "test_agent_activity" line 92 at SQL statement
The trick is, of course, that the columns labeled 'Label 1' and 'Label
2' are dependent on the set of activities defined in the contents of the
activity table, which I cannot predict when calling the function. How
can I create a function to access this information?
If you really want to create such table dynamically, maybe just create a temporary table within the function so it can have any columns you want. Let the function insert all rows into the table instead of returning them. The function can return only the name of the table or you can just have one exact table name that you know. After running that function you can just select data from the table. The function should also check if the temporary table exists so it should delete or truncate it.
Simon's answer might be better overall in the end, I'm just telling you how to do it without changing what you've got.
From the docs:
from_item can be one of:
...
function_name ( [ argument [, ...] ] ) [ AS ] alias [ ( column_alias [, ...] | column_definition [, ...] ) ]
function_name ( [ argument [, ...] ] ) AS ( column_definition [, ...] )
In other words, later it says:
If the function has been defined as
returning the record data type, then
an alias or the key word AS must be
present, followed by a column
definition list in the form (
column_name data_type [, ... ] ). The
column definition list must match the
actual number and types of columns
returned by the function.
I think the alias thing is only an option if you've predefined a type somewhere (like if you're mimicing the output of a predefined table, or have actually used CREATE TYPE...don't quote me on that, though.)
So, I think you would need something like:
SELECT *
FROM reports.get_agent_activity_breakdown('percentage', NULL, NULL)
AS (agent_id integer, portal_user_id integer, total something, ...)
The problem for you lies in the .... You'll need to know before you execute the query the names and types of all the columns--so you'll end up selecting on public.activity twice.
Both Simon's and Kev's answers are good ones, but what I ended up doing was splitting the calls to the database into two queries:
Build the query using the query constructor methods I included in the question, returning that to the application.
Call the query directly, and return that data.
This is safe in my case because the dynamic column list is not subject to frequent change, so I don't need to worry about the query's target data changing in between these calls. Otherwise, though, my method might not work.
you cannot change number of output columns, but you can to use refcursor, and you can return opened cursor.
more on http://okbob.blogspot.com/2008/08/using-cursors-for-generating-cross.html