dynamic sql selection in nested select - sql

Let's say that I have a tables tasks, projects, and work_items, all of which have a column fields containing a json object of custom values.
Now lets say I want to write a function to query an arbitrary table for its field names.
CREATE OR REPLACE FUNCTION getFieldNames(varchar) RETURNS varchar[] AS
$BODY$
DECLARE
fieldNames varchar[];
BEGIN
fieldNames := ARRAY(SELECT DISTINCT fieldName FROM
(EXECUTE 'SELECT json_object_keys(fields) AS fieldName FROM '
|| quote_ident($1)
) AS derivedFields
);
RETURN fieldNames;
END
$BODY$
LANGUAGE 'plpgsql';
This however errors out with:
ERROR: syntax error at or near "'SELECT json_object_keys(fields) AS fieldName FROM'"
LINE 8: (EXECUTE 'SELECT json_object_keys(fields) AS fieldNa..
The Nested select itself is sound as I verified by replacing the execute with
(SELECT json_object_keys(fields) AS fieldName
FROM tasks
)
and receiving correct results.
What is wrong with my code?

The EXECUTE statement does not return a relation that you can use as a sub-query. Instead, if it returns anything at all, it populates a variable or a single row through the INTO clause. The latter obviously does not match your requirement so you are stuck with the first. A more elegant solution is to move the EXECUTE statement outwards:
EXECUTE
'SELECT array_agg(DISTINCT fields) FROM ' ||
'(SELECT json_object_keys(fields) AS fields FROM ' || quote_ident($1) || ') AS x'
INTO fieldNames;

The worst issue is wrong position of EXECUTE statement. It is a PLpgSQL statement and then it cannot be placed in any SQL expression.

Related

Infer row type from table in postgresql

My application uses multiple schemas to partition tenants across the database to improve performance. I am trying to create a plpgsql function that will give me an arbitrary result set based on the union of all application schemas given a table. Here is what I have so far (inspired by this blog post):
CREATE OR REPLACE FUNCTION app_union(tbl text) RETURNS SETOF RECORD AS $$
DECLARE
schema RECORD;
sql TEXT := '';
BEGIN
FOR schema IN EXECUTE 'SELECT distinct schema FROM tenants' LOOP
sql := sql || format('SELECT * FROM %I.%I %s UNION ALL ', schema.schema, tbl);
END LOOP;
RETURN QUERY EXECUTE left(sql, -11);
END
$$ LANGUAGE plpgsql;
This works great, but has to be called with a row type definition at the end:
select * from app_union('my_table') t(id uuid, name text, ...);
So, how can I call my function without providing a row type?
I know that I can introspect my tables using information_schema.columns, but I'm not sure how to dynamically generate the type declaration without a lot of case statements (columns doesn't report the definition sql the way that e.g., pg_indexes does).
Even if I could dynamically generate the row declaration, it seems I would have to append it to my former function call as dynamic sql anyway, which sort of chicken/eggs the problem of returning a result set of an arbitrary type from a function.
Instead of providing the table as a string, you could provide it as type anyelement to specify the actual type of the returning data, then infer the table's name using pg_typeof. You can also use string_agg rather than a loop to build your sql:
CREATE OR REPLACE FUNCTION app_union(tbl anyelement)
RETURNS setof anyelement AS $$
BEGIN
return query execute string_agg(
distinct format('select * from %I.%I', schema, pg_typeof(tbl)::text),
' union all '
) from tenants;
END
$$ LANGUAGE plpgsql;
select * from app_union(null::my_table);
Simplified example

Truncate if exists in psql function and call function

I have the following code to create a function that truncates all rows from the table web_channel2 if the table is not empty:
create or replace function truncate_if_exists(tablename text)
returns void language plpgsql as $$
begin
select
from information_schema.tables
where table_name = tablename;
if found then
execute format('truncate %I', tablename);
end if;
end $$;
Unfortunately I don't know how should I continue ...
How to execute the function?
TLDR
To execute a Postgres function (returning void), call it with SELECT:
SELECT truncate_if_exists('web_channel2');
Proper solution
... how should I continue?
Delete the function again.
DROP FUNCTION truncate_if_exists(text);
It does not offer any way to schema-qualify the table. Using it might truncate the wrong table ...
Looks like you are trying to avoid an exception if the table is not there.
And you only want to truncate ...
if the table is not empty
To that end, I might use a safe function like this:
CREATE OR REPLACE FUNCTION public.truncate_if_exists(_table text, _schema text DEFAULT NULL)
RETURNS text
LANGUAGE plpgsql AS
$func$
DECLARE
_qual_tbl text := concat_ws('.', quote_ident(_schema), quote_ident(_table));
_row_found bool;
BEGIN
IF to_regclass(_qual_tbl) IS NOT NULL THEN -- table exists
EXECUTE 'SELECT EXISTS (SELECT FROM ' || _qual_tbl || ')'
INTO _row_found;
IF _row_found THEN -- table is not empty
EXECUTE 'TRUNCATE ' || _qual_tbl;
RETURN 'Table truncated: ' || _qual_tbl;
ELSE -- optional!
RETURN 'Table exists but is empty: ' || _qual_tbl;
END IF;
ELSE -- optional!
RETURN 'Table not found: ' || _qual_tbl;
END IF;
END
$func$;
To execute, call it with SELECT:
SELECT truncate_if_exists('web_channel2');
If no schema is provided, the function falls back to traversing the search_path - like your original did. If that's unreliable, or generally, to be safe (which seems prudent when truncating tables!) provide the schema explicitly:
SELECT truncate_if_exists('web_channel2', 'my_schema');
db<>fiddle here
When providing identifiers as strings, you need to use exact capitalization.
Why the custom variable _row_found instead of FOUND? See:
Dynamic SQL (EXECUTE) as condition for IF statement
Basics:
Table name as a PostgreSQL function parameter
How to check if a table exists in a given schema
PL/pgSQL checking if a row exists
How does the search_path influence identifier resolution and the "current schema"
Are PostgreSQL column names case-sensitive?

Iterate through column names to get counts in a PL/pgSQL function

I have a table in my Postgres database that I'm trying to determine fill rates for (that is, I'm trying to understand how often data is/isn't missing). I need to make a function that, for each column (in a list of a couple dozen columns I've selected), counts the number and percentage of columns with non-null values.
The problem is, I don't really know how to iterate through a list of columns in a programmatic way, because I don't know how to reference a column from a string of its name. I've read about how you can use the EXECUTE command to run dynamically-written SQL, but I haven't been able to get it to work. Here's my current function:
CREATE OR REPLACE FUNCTION get_fill_rates() RETURNS TABLE (field_name text, fill_count integer, fill_percentage float) AS $$
DECLARE
fields text[] := array['column_a', 'column_b', 'column_c'];
total_rows integer;
BEGIN
SELECT reltuples INTO total_rows FROM pg_class WHERE relname = 'my_table';
FOR i IN array_lower(fields, 1) .. array_upper(fields, 1)
LOOP
field_name := fields[i];
EXECUTE 'SELECT COUNT(*) FROM my_table WHERE $1 IS NOT NULL' INTO fill_count USING field_name;
fill_percentage := fill_count::float / total_rows::float;
RETURN NEXT;
END LOOP;
END;
$$ LANGUAGE plpgsql;
SELECT * FROM get_fill_rates() ORDER BY fill_count DESC;
This function, as written, returns every field as having a 100% fill rate, which I know to be false. How can I make this function work?
I know you already solved it. But let me suggest you to avoid concatenating identifiers on dynamic queries, you can use format with a identifier wildcard instead:
CREATE OR REPLACE FUNCTION get_fill_rates() RETURNS TABLE (field_name text, fill_count integer, fill_percentage float) AS $$
DECLARE
fields text[] := array['column_a', 'column_b', 'column_c'];
table_name name := 'my_table';
total_rows integer;
BEGIN
SELECT reltuples INTO total_rows FROM pg_class WHERE relname = table_name;
FOREACH field_name IN ARRAY fields
LOOP
EXECUTE format('SELECT COUNT(*) FROM %I WHERE %I IS NOT NULL', table_name, field_name) INTO fill_count;
fill_percentage := fill_count::float / total_rows::float;
RETURN NEXT;
END LOOP;
END;
$$ LANGUAGE plpgsql;
Doing this way will help you preventing SQL-injection attacks and will reduce query parse overhead a bit. More info here.
I figured out the solution after I wrote my question but before I submitted it -- since I've already done the work of writing the question, I'll just go ahead and share the answer. The problem was in my EXECUTE statement, specifically with that USING field_name bit. I think it was getting treated as a string literal when I did it that way, which meant the query was evaluating if "a string literal" IS NOT NULL which of course, is always true.
Instead of parameterizing the column name, I need to inject it directly into the query string. So, I changed my EXECUTE line to the following:
EXECUTE 'SELECT COUNT(*) FROM my_table WHERE ' || field_name || ' IS NOT NULL' INTO fill_count;
Some problems in the code aside (see below), this can be substantially faster and simpler with a single scan over the table in a plain query:
SELECT v.*
FROM (
SELECT count(column_a) AS ct_column_a
, count(column_b) AS ct_column_b
, count(column_c) AS ct_column_c
, count(*)::numeric AS ct
FROM my_table
) sub
, LATERAL (
VALUES
(text 'column_a', ct_column_a, round(ct_column_a / ct, 3))
, (text 'column_b', ct_column_b, round(ct_column_b / ct, 3))
, (text 'column_c', ct_column_c, round(ct_column_c / ct, 3))
) v(field_name, fill_count, fill_percentage);
The crucial "trick" here is that count() only counts non-null values to begin with, no tricks required.
I rounded the percentage to 3 decimal digits, which is optional. For this I cast to numeric.
Use a VALUES expression to unpivot the results and get one row per field.
For repeated use or if you have a long list of columns to process, you can generate and execute the query dynamically. But, again, don't run a separate count for each column. Just build above query dynamically:
CREATE OR REPLACE FUNCTION get_fill_rates(tbl regclass, fields text[])
RETURNS TABLE (field_name text, fill_count bigint, fill_percentage numeric) AS
$func$
BEGIN
RETURN QUERY EXECUTE (
-- RAISE NOTICE '%', ( -- to debug if needed
SELECT
'SELECT v.*
FROM (
SELECT count(*)::numeric AS ct
, ' || string_agg(format('count(%I) AS %I', fld, 'ct_' || fld), ', ') || '
FROM ' || tbl || '
) sub
, LATERAL (
VALUES
(text ' || string_agg(format('%L, %2$I, round(%2$I/ ct, 3))', fld, 'ct_' || fld), ', (') || '
) v(field_name, fill_count, fill_pct)
ORDER BY v.fill_count DESC'
FROM unnest(fields) fld
);
END
$func$ LANGUAGE plpgsql;
Call:
SELECT * FROM get_fill_rates('my_table', '{column_a, column_b, column_c}');
As you can see, this works for any given table and column list now.
And all identifiers are properly quoted automatically, using format() or by the built-in virtues of the regclass type.
Related:
Table name as a PostgreSQL function parameter
How to unpivot a table in PostgreSQL
Query for crosstab view
Convert one row into multiple rows with fewer columns
Your original query could be improved like this, but this is just lipstick on a pig. Do not use this inefficient approach.
CREATE OR REPLACE FUNCTION get_fill_rates()
RETURNS TABLE (field_name text, fill_count bigint, fill_percentage float) AS
$$
DECLARE
fields text[] := '{column_a, column_b, column_c}'; -- must be legal identifiers!
total_rows float; -- use float right away
BEGIN
SELECT reltuples INTO total_rows FROM pg_class WHERE relname = 'my_table';
FOREACH field_name IN ARRAY fields -- use FOREACH
LOOP
EXECUTE 'SELECT COUNT(*) FROM big WHERE ' || field_name || ' IS NOT NULL'
INTO fill_count;
fill_percentage := fill_count / total_rows; -- already type float
RETURN NEXT;
END LOOP;
END
$$ LANGUAGE plpgsql;
Plus, pg_class.reltuples is only an estimate. Since you are counting anyway, use an actual count.
Related:
Iterating over integer[] in PL/pgSQL
Fast way to discover the row count of a table in PostgreSQL

Postgres dynamically select all text columns - subquery?

I want a select that returns all fields in a table that are of type = "character varying". It needs to run across multiple tables, so needs to be dynamic.
I was trying to use a subquery to first get the text columns, and then run the query:
SELECT (SELECT STRING_AGG(QUOTE_IDENT(column_name), ', ') FROM
information_schema.columns WHERE table_name = foo
AND data_type = 'character varying') FROM foo;
But that's not working, I just get a list of column names but not the values. Does anyone know how I can make it work or a better way to do it?
Thank you,
Ben
You need Pl/PgSQL for this, as PostgreSQL doesn't support dynamic SQL in its plain SQL dialect.
CREATE OR REPLACE FUNCTION get_cols(target_table text) RETURNS SETOF record AS $$
DECLARE
cols text;
BEGIN
cols := (SELECT STRING_AGG(QUOTE_IDENT(column_name), ', ')
FROM information_schema.columns
WHERE table_name = target_table
AND data_type = 'character varying');
RETURN QUERY EXECUTE 'SELECT '||cols||' FROM '||quote_ident(target_table)||';';
END;
$$
LANGUAGE plpgsql;
However, you'll find this hard to call, as you need to know the result column list to be able to call it. That kind of defeats the point. You'll need to massage the result into a concrete type. I convert to hstore here, but you could return json or an array or whatever, really:
CREATE OR REPLACE FUNCTION get_cols(target_table text) RETURNS SETOF hstore AS $$
DECLARE
cols text;
BEGIN
cols := (SELECT STRING_AGG(QUOTE_IDENT(column_name), ', ')
FROM information_schema.columns
WHERE table_name = target_table
AND data_type = 'character varying');
RETURN QUERY EXECUTE 'SELECT hstore(ROW('||cols||')) FROM '||quote_ident(target_table)||';';
END;
$$
LANGUAGE plpgsql;
Dynamic SQL is a pain, consider doing this at the application level.

Select from dynamic table names

Consider this query.
SELECT app_label || '_' || model as name from django_content_type where id = 12;
name
-------------------
merc_benz
DJango people might have guessed, 'merc_benz' is a table name in same db. I want to write some SQL migrations and I need to select results from such dynamic table names.
How can i use variable name as a table name???
Something like this...(see RETURN QUERY EXECUTE in the plpgsql portion of the manual)
CREATE function dynamic_table_select(v_id int) returns setof text as $$
DECLARE
v_table_name text;
BEGIN
SELECT app_label || '_' || model into
v_table_name from django_content_type where id = v_id;
RETURN QUERY EXECUTE 'SELECT a_text_column from '||quote_ident(v_table_name);
RETURN;
END
$$ LANGUAGE plpgsql;
It becomes a little more complex if you want to return more than a single column of one type - either create a TYPE that is representative, or if you're using all the columns of a table, there's already a TYPE of that table name. You could also specify multiple OUT parameters.
http://www.postgresql.org/docs/8.1/static/ecpg-dynamic.html
The basic answer I think is EXECUTE IMMEDIATE