Postgres dynamically select all text columns - subquery? - sql

I want a select that returns all fields in a table that are of type = "character varying". It needs to run across multiple tables, so needs to be dynamic.
I was trying to use a subquery to first get the text columns, and then run the query:
SELECT (SELECT STRING_AGG(QUOTE_IDENT(column_name), ', ') FROM
information_schema.columns WHERE table_name = foo
AND data_type = 'character varying') FROM foo;
But that's not working, I just get a list of column names but not the values. Does anyone know how I can make it work or a better way to do it?
Thank you,
Ben

You need Pl/PgSQL for this, as PostgreSQL doesn't support dynamic SQL in its plain SQL dialect.
CREATE OR REPLACE FUNCTION get_cols(target_table text) RETURNS SETOF record AS $$
DECLARE
cols text;
BEGIN
cols := (SELECT STRING_AGG(QUOTE_IDENT(column_name), ', ')
FROM information_schema.columns
WHERE table_name = target_table
AND data_type = 'character varying');
RETURN QUERY EXECUTE 'SELECT '||cols||' FROM '||quote_ident(target_table)||';';
END;
$$
LANGUAGE plpgsql;
However, you'll find this hard to call, as you need to know the result column list to be able to call it. That kind of defeats the point. You'll need to massage the result into a concrete type. I convert to hstore here, but you could return json or an array or whatever, really:
CREATE OR REPLACE FUNCTION get_cols(target_table text) RETURNS SETOF hstore AS $$
DECLARE
cols text;
BEGIN
cols := (SELECT STRING_AGG(QUOTE_IDENT(column_name), ', ')
FROM information_schema.columns
WHERE table_name = target_table
AND data_type = 'character varying');
RETURN QUERY EXECUTE 'SELECT hstore(ROW('||cols||')) FROM '||quote_ident(target_table)||';';
END;
$$
LANGUAGE plpgsql;
Dynamic SQL is a pain, consider doing this at the application level.

Related

How to include quote in plpgsql function

The following function identifies columns with null values. How can I extend the where clause to check null or empty value?
coalesce(TRIM(string), '') = ''
CREATE OR REPLACE FUNCTION public.is_column_empty(IN table_name varchar, IN column_name varchar)
RETURNS bool
LANGUAGE plpgsql
AS $function$
declare
count integer;
BEGIN
execute FORMAT('SELECT COUNT(*) from %s WHERE %s IS NOT NULL', table_name, quote_ident(column_name)) into count;
RETURN (count = 0);
END;
$function$
;
There are more possibilities - for example you can use custom string separators:
CREATE OR REPLACE FUNCTION public.is_column_empty(IN table_name varchar,
IN column_name varchar)
RETURNS bool
LANGUAGE plpgsql
AS $function$
DECLARE _found boolean; /* attention "count" is keyword */
BEGIN
EXECUTE format($_$SELECT EXISTS(SELECT * FROM %I WHERE COALESCE(trim(%I), '') <> '')$_$,
table_name, column_name)
INTO _found;
RETURN NOT _found;
END;
$function$;
your example has more issues:
don't use count where you really need to know number of rows (items). This can be pretty slow on bigger tables
Usually for keywords are used uppercase chars
don't use variable names that are SQL, PL/pgSQL keywords (reserved or unreserved), there can be some problems in some contexts (count, user, ...)
this is classic example of some chaos in data - you should to disallow empty strings in data. Then you can use index and the predicate COLNAME IS NOT NULL. It can be pretty fast.
You need to double up the quotation marks, like this:
CREATE OR REPLACE FUNCTION public.is_column_empty(IN table_name varchar, IN column_name varchar)
RETURNS bool
LANGUAGE plpgsql
AS $function$
declare
count integer;
BEGIN
execute FORMAT('SELECT COUNT(*) from %s WHERE COALESCE(TRIM(%s),'''') <> ''''', table_name, quote_ident(column_name)) into count;
RETURN (count = 0);
END;
$function$
;
EDIT:
Re-reading your question, I was a little unsure that you are getting what you want. As it stands the function returns false if at least one row has a value in the given column, even if all the other rows are empty. Is this really what you want, or are you rather looking for columns where any row has this column empty?

Iterate through column names to get counts in a PL/pgSQL function

I have a table in my Postgres database that I'm trying to determine fill rates for (that is, I'm trying to understand how often data is/isn't missing). I need to make a function that, for each column (in a list of a couple dozen columns I've selected), counts the number and percentage of columns with non-null values.
The problem is, I don't really know how to iterate through a list of columns in a programmatic way, because I don't know how to reference a column from a string of its name. I've read about how you can use the EXECUTE command to run dynamically-written SQL, but I haven't been able to get it to work. Here's my current function:
CREATE OR REPLACE FUNCTION get_fill_rates() RETURNS TABLE (field_name text, fill_count integer, fill_percentage float) AS $$
DECLARE
fields text[] := array['column_a', 'column_b', 'column_c'];
total_rows integer;
BEGIN
SELECT reltuples INTO total_rows FROM pg_class WHERE relname = 'my_table';
FOR i IN array_lower(fields, 1) .. array_upper(fields, 1)
LOOP
field_name := fields[i];
EXECUTE 'SELECT COUNT(*) FROM my_table WHERE $1 IS NOT NULL' INTO fill_count USING field_name;
fill_percentage := fill_count::float / total_rows::float;
RETURN NEXT;
END LOOP;
END;
$$ LANGUAGE plpgsql;
SELECT * FROM get_fill_rates() ORDER BY fill_count DESC;
This function, as written, returns every field as having a 100% fill rate, which I know to be false. How can I make this function work?
I know you already solved it. But let me suggest you to avoid concatenating identifiers on dynamic queries, you can use format with a identifier wildcard instead:
CREATE OR REPLACE FUNCTION get_fill_rates() RETURNS TABLE (field_name text, fill_count integer, fill_percentage float) AS $$
DECLARE
fields text[] := array['column_a', 'column_b', 'column_c'];
table_name name := 'my_table';
total_rows integer;
BEGIN
SELECT reltuples INTO total_rows FROM pg_class WHERE relname = table_name;
FOREACH field_name IN ARRAY fields
LOOP
EXECUTE format('SELECT COUNT(*) FROM %I WHERE %I IS NOT NULL', table_name, field_name) INTO fill_count;
fill_percentage := fill_count::float / total_rows::float;
RETURN NEXT;
END LOOP;
END;
$$ LANGUAGE plpgsql;
Doing this way will help you preventing SQL-injection attacks and will reduce query parse overhead a bit. More info here.
I figured out the solution after I wrote my question but before I submitted it -- since I've already done the work of writing the question, I'll just go ahead and share the answer. The problem was in my EXECUTE statement, specifically with that USING field_name bit. I think it was getting treated as a string literal when I did it that way, which meant the query was evaluating if "a string literal" IS NOT NULL which of course, is always true.
Instead of parameterizing the column name, I need to inject it directly into the query string. So, I changed my EXECUTE line to the following:
EXECUTE 'SELECT COUNT(*) FROM my_table WHERE ' || field_name || ' IS NOT NULL' INTO fill_count;
Some problems in the code aside (see below), this can be substantially faster and simpler with a single scan over the table in a plain query:
SELECT v.*
FROM (
SELECT count(column_a) AS ct_column_a
, count(column_b) AS ct_column_b
, count(column_c) AS ct_column_c
, count(*)::numeric AS ct
FROM my_table
) sub
, LATERAL (
VALUES
(text 'column_a', ct_column_a, round(ct_column_a / ct, 3))
, (text 'column_b', ct_column_b, round(ct_column_b / ct, 3))
, (text 'column_c', ct_column_c, round(ct_column_c / ct, 3))
) v(field_name, fill_count, fill_percentage);
The crucial "trick" here is that count() only counts non-null values to begin with, no tricks required.
I rounded the percentage to 3 decimal digits, which is optional. For this I cast to numeric.
Use a VALUES expression to unpivot the results and get one row per field.
For repeated use or if you have a long list of columns to process, you can generate and execute the query dynamically. But, again, don't run a separate count for each column. Just build above query dynamically:
CREATE OR REPLACE FUNCTION get_fill_rates(tbl regclass, fields text[])
RETURNS TABLE (field_name text, fill_count bigint, fill_percentage numeric) AS
$func$
BEGIN
RETURN QUERY EXECUTE (
-- RAISE NOTICE '%', ( -- to debug if needed
SELECT
'SELECT v.*
FROM (
SELECT count(*)::numeric AS ct
, ' || string_agg(format('count(%I) AS %I', fld, 'ct_' || fld), ', ') || '
FROM ' || tbl || '
) sub
, LATERAL (
VALUES
(text ' || string_agg(format('%L, %2$I, round(%2$I/ ct, 3))', fld, 'ct_' || fld), ', (') || '
) v(field_name, fill_count, fill_pct)
ORDER BY v.fill_count DESC'
FROM unnest(fields) fld
);
END
$func$ LANGUAGE plpgsql;
Call:
SELECT * FROM get_fill_rates('my_table', '{column_a, column_b, column_c}');
As you can see, this works for any given table and column list now.
And all identifiers are properly quoted automatically, using format() or by the built-in virtues of the regclass type.
Related:
Table name as a PostgreSQL function parameter
How to unpivot a table in PostgreSQL
Query for crosstab view
Convert one row into multiple rows with fewer columns
Your original query could be improved like this, but this is just lipstick on a pig. Do not use this inefficient approach.
CREATE OR REPLACE FUNCTION get_fill_rates()
RETURNS TABLE (field_name text, fill_count bigint, fill_percentage float) AS
$$
DECLARE
fields text[] := '{column_a, column_b, column_c}'; -- must be legal identifiers!
total_rows float; -- use float right away
BEGIN
SELECT reltuples INTO total_rows FROM pg_class WHERE relname = 'my_table';
FOREACH field_name IN ARRAY fields -- use FOREACH
LOOP
EXECUTE 'SELECT COUNT(*) FROM big WHERE ' || field_name || ' IS NOT NULL'
INTO fill_count;
fill_percentage := fill_count / total_rows; -- already type float
RETURN NEXT;
END LOOP;
END
$$ LANGUAGE plpgsql;
Plus, pg_class.reltuples is only an estimate. Since you are counting anyway, use an actual count.
Related:
Iterating over integer[] in PL/pgSQL
Fast way to discover the row count of a table in PostgreSQL

Plpgsql; store all table names into an array

My main purpose is actually, filtering all table names contain 'Messdaten' in it (for example "ID: 843063334 CH: 0001 Messdaten") and create new tables out of them with 'create table as' command as 'Backup_Messdaten1', 'Backup_Messdaten2', etc.
First I was trying to store all table names without filtering (there is maybe a way to retrieve all table names, contain 'Messdaten' in it by sql query, I don't know), and then storing the ones contain 'Messdaten' into another array and using that new array in the 'create table as' command.
But as I said my first goal is just to store all table names into an array;
Code itself;
CREATE OR REPLACE FUNCTION retrieve()
RETURNS text[] AS
$BODY$DECLARE
tbl_names text[];
BEGIN
tbl_names := array(SELECT table_name FROM information_schema.tables WHERE
table_schema='public' AND table_type='BASE TABLE');
SELECT tbl_names[i] FROM generate_subscripts(tbl_names, 1) g(i);
END;$BODY$
LANGUAGE plpgsql VOLATILE
COST 100;
ALTER FUNCTION retrieve()
OWNER TO postgres;
But for the code above, I am getting such an error;
Error;
ERROR: could not find array type for data type information_schema.sql_identifier
SQL state: 42704
Context: SQL statement "SELECT array(SELECT table_name FROM information_schema.tables WHERE table_schema='public' AND table_type='BASE TABLE')"
PL/pgSQL function retrieve() line 4 at assignment
Do you have any idea what is wrong with it and by the way I explained my main purpose, I would appreciate it If you point me to the right direction regarding that purpose.
SELECT array_agg(table_name::text)
FROM information_schema.tables
WHERE table_schema='public' AND table_type='BASE TABLE';
You need to cast the table name to text. The subquery is unnecessary, and you need to use array_agg not the array pseudo-function.
Personally I don't see why you need to aggregate them into an array at all, though. I'd just:
DECLARE
tablename text;
BEGIN
FOR tablename IN
SELECT table_name FROM information_schema.tables
WHERE table_schema='public' AND table_type='BASE TABLE'
AND ... my extra filters here ...
LOOP
EXECUTE format('CREATE TABLE %I AS TABLE %I', tablename || '_backup', tablename);
END LOOP;
END;
Your code contains more errors - basic error is missing any RETURN statement (for PL/pgSQL language). You can use SQL language too (see my example)
Postgres doesn't support arrays for some types - sql_identifier is one. You can try to use a casting to some basic type - in this case to text.
CREATE OR REPLACE FUNCTION names(filter text)
RETURNS text[] AS $$
SELECT array_agg(table_name::text)
FROM information_schema.tables
WHERE table_schema='public'
AND table_type='BASE TABLE' AND table_name LIKE $1;
$$ LANGUAGE sql;
postgres=# select names('foo%');
names
------------
{foo1,foo}
(1 row)

Returning results from a function in 'select statement' format

I have a function that looks like this:
CREATE OR REPLACE FUNCTION mffcu.test_ty_hey()
RETURNS setof record
LANGUAGE plpgsql
AS $function$
Declare
cname1 text;
sql2 text;
Begin
for cname1 in
select array_to_string(useme, ', ') from (
select array_agg(column_name) as useme
from(
select column_name::text
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = 'crosstab_183'
and ordinal_position != 1
) as fin
) as fine
loop
sql2 := 'select distinct array['|| cname1 ||'] from mffcu.crosstab_183';
execute sql2;
end loop;
END;
$function$
I call the function with this:
select mffcu.test_ty_hey()
How do I return the results of the sql2 query without creating a table/temporary table?
While #Pavel is right, of course, your very convoluted function could be untangled to:
CREATE OR REPLACE FUNCTION mffcu.test_ty_hey()
RETURNS SETOF text[] LANGUAGE plpgsql
AS $func$
DECLARE
cname1 text;
BEGIN
FOR cname1 IN
SELECT column_name::text
FROM information_schema.columns
WHERE table_name = 'crosstab_183'
AND table_schema = 'mffcu'
AND ordinal_position <> 1
LOOP
RETURN QUERY
EXECUTE format('SELECT DISTINCT ARRAY[%I::text]
FROM mffcu.crosstab_183', cname1);
END LOOP;
END
$func$
format() requires PostgreSQL 9.1 or later. In 9.0 you can substitute with:
EXECUTE 'SELECT DISTINCT ARRAY['|| quote_ident(cname1) ||'::text]
FROM mffcu.crosstab_183';
Call:
select * FROM mffcu.test_ty_hey();
By casting each column to text we arrive at a consistent data type that can be used to declare the RETURN type. This compromise has to be made to return various data types from one function. Every data type can be cast to text, so that's the obvious common ground.
BTW, I have trouble imagining what the ARRAY wrapper around every single value should be good for. I suppose you could just drop that.
PostgreSQL functions should to have fixed result type before execution. You cannot specify type late in execution. There is only two workarounds - using temp tables or using cursors.
PLpgSQL language is not good for too generic routines - it good for implementation strict and clean business rules. And bad for generic crosstab calculations or generic auditing or similar generic task. It works, but code is slower and usually not well maintainable.
but reply for your query, you can use a output cursors
example http://okbob.blogspot.cz/2008/08/using-cursors-for-generating-cross.html

Select from dynamic table names

Consider this query.
SELECT app_label || '_' || model as name from django_content_type where id = 12;
name
-------------------
merc_benz
DJango people might have guessed, 'merc_benz' is a table name in same db. I want to write some SQL migrations and I need to select results from such dynamic table names.
How can i use variable name as a table name???
Something like this...(see RETURN QUERY EXECUTE in the plpgsql portion of the manual)
CREATE function dynamic_table_select(v_id int) returns setof text as $$
DECLARE
v_table_name text;
BEGIN
SELECT app_label || '_' || model into
v_table_name from django_content_type where id = v_id;
RETURN QUERY EXECUTE 'SELECT a_text_column from '||quote_ident(v_table_name);
RETURN;
END
$$ LANGUAGE plpgsql;
It becomes a little more complex if you want to return more than a single column of one type - either create a TYPE that is representative, or if you're using all the columns of a table, there's already a TYPE of that table name. You could also specify multiple OUT parameters.
http://www.postgresql.org/docs/8.1/static/ecpg-dynamic.html
The basic answer I think is EXECUTE IMMEDIATE