I have the following code to create a function that truncates all rows from the table web_channel2 if the table is not empty:
create or replace function truncate_if_exists(tablename text)
returns void language plpgsql as $$
begin
select
from information_schema.tables
where table_name = tablename;
if found then
execute format('truncate %I', tablename);
end if;
end $$;
Unfortunately I don't know how should I continue ...
How to execute the function?
TLDR
To execute a Postgres function (returning void), call it with SELECT:
SELECT truncate_if_exists('web_channel2');
Proper solution
... how should I continue?
Delete the function again.
DROP FUNCTION truncate_if_exists(text);
It does not offer any way to schema-qualify the table. Using it might truncate the wrong table ...
Looks like you are trying to avoid an exception if the table is not there.
And you only want to truncate ...
if the table is not empty
To that end, I might use a safe function like this:
CREATE OR REPLACE FUNCTION public.truncate_if_exists(_table text, _schema text DEFAULT NULL)
RETURNS text
LANGUAGE plpgsql AS
$func$
DECLARE
_qual_tbl text := concat_ws('.', quote_ident(_schema), quote_ident(_table));
_row_found bool;
BEGIN
IF to_regclass(_qual_tbl) IS NOT NULL THEN -- table exists
EXECUTE 'SELECT EXISTS (SELECT FROM ' || _qual_tbl || ')'
INTO _row_found;
IF _row_found THEN -- table is not empty
EXECUTE 'TRUNCATE ' || _qual_tbl;
RETURN 'Table truncated: ' || _qual_tbl;
ELSE -- optional!
RETURN 'Table exists but is empty: ' || _qual_tbl;
END IF;
ELSE -- optional!
RETURN 'Table not found: ' || _qual_tbl;
END IF;
END
$func$;
To execute, call it with SELECT:
SELECT truncate_if_exists('web_channel2');
If no schema is provided, the function falls back to traversing the search_path - like your original did. If that's unreliable, or generally, to be safe (which seems prudent when truncating tables!) provide the schema explicitly:
SELECT truncate_if_exists('web_channel2', 'my_schema');
db<>fiddle here
When providing identifiers as strings, you need to use exact capitalization.
Why the custom variable _row_found instead of FOUND? See:
Dynamic SQL (EXECUTE) as condition for IF statement
Basics:
Table name as a PostgreSQL function parameter
How to check if a table exists in a given schema
PL/pgSQL checking if a row exists
How does the search_path influence identifier resolution and the "current schema"
Are PostgreSQL column names case-sensitive?
I have a table in my Postgres database that I'm trying to determine fill rates for (that is, I'm trying to understand how often data is/isn't missing). I need to make a function that, for each column (in a list of a couple dozen columns I've selected), counts the number and percentage of columns with non-null values.
The problem is, I don't really know how to iterate through a list of columns in a programmatic way, because I don't know how to reference a column from a string of its name. I've read about how you can use the EXECUTE command to run dynamically-written SQL, but I haven't been able to get it to work. Here's my current function:
CREATE OR REPLACE FUNCTION get_fill_rates() RETURNS TABLE (field_name text, fill_count integer, fill_percentage float) AS $$
DECLARE
fields text[] := array['column_a', 'column_b', 'column_c'];
total_rows integer;
BEGIN
SELECT reltuples INTO total_rows FROM pg_class WHERE relname = 'my_table';
FOR i IN array_lower(fields, 1) .. array_upper(fields, 1)
LOOP
field_name := fields[i];
EXECUTE 'SELECT COUNT(*) FROM my_table WHERE $1 IS NOT NULL' INTO fill_count USING field_name;
fill_percentage := fill_count::float / total_rows::float;
RETURN NEXT;
END LOOP;
END;
$$ LANGUAGE plpgsql;
SELECT * FROM get_fill_rates() ORDER BY fill_count DESC;
This function, as written, returns every field as having a 100% fill rate, which I know to be false. How can I make this function work?
I know you already solved it. But let me suggest you to avoid concatenating identifiers on dynamic queries, you can use format with a identifier wildcard instead:
CREATE OR REPLACE FUNCTION get_fill_rates() RETURNS TABLE (field_name text, fill_count integer, fill_percentage float) AS $$
DECLARE
fields text[] := array['column_a', 'column_b', 'column_c'];
table_name name := 'my_table';
total_rows integer;
BEGIN
SELECT reltuples INTO total_rows FROM pg_class WHERE relname = table_name;
FOREACH field_name IN ARRAY fields
LOOP
EXECUTE format('SELECT COUNT(*) FROM %I WHERE %I IS NOT NULL', table_name, field_name) INTO fill_count;
fill_percentage := fill_count::float / total_rows::float;
RETURN NEXT;
END LOOP;
END;
$$ LANGUAGE plpgsql;
Doing this way will help you preventing SQL-injection attacks and will reduce query parse overhead a bit. More info here.
I figured out the solution after I wrote my question but before I submitted it -- since I've already done the work of writing the question, I'll just go ahead and share the answer. The problem was in my EXECUTE statement, specifically with that USING field_name bit. I think it was getting treated as a string literal when I did it that way, which meant the query was evaluating if "a string literal" IS NOT NULL which of course, is always true.
Instead of parameterizing the column name, I need to inject it directly into the query string. So, I changed my EXECUTE line to the following:
EXECUTE 'SELECT COUNT(*) FROM my_table WHERE ' || field_name || ' IS NOT NULL' INTO fill_count;
Some problems in the code aside (see below), this can be substantially faster and simpler with a single scan over the table in a plain query:
SELECT v.*
FROM (
SELECT count(column_a) AS ct_column_a
, count(column_b) AS ct_column_b
, count(column_c) AS ct_column_c
, count(*)::numeric AS ct
FROM my_table
) sub
, LATERAL (
VALUES
(text 'column_a', ct_column_a, round(ct_column_a / ct, 3))
, (text 'column_b', ct_column_b, round(ct_column_b / ct, 3))
, (text 'column_c', ct_column_c, round(ct_column_c / ct, 3))
) v(field_name, fill_count, fill_percentage);
The crucial "trick" here is that count() only counts non-null values to begin with, no tricks required.
I rounded the percentage to 3 decimal digits, which is optional. For this I cast to numeric.
Use a VALUES expression to unpivot the results and get one row per field.
For repeated use or if you have a long list of columns to process, you can generate and execute the query dynamically. But, again, don't run a separate count for each column. Just build above query dynamically:
CREATE OR REPLACE FUNCTION get_fill_rates(tbl regclass, fields text[])
RETURNS TABLE (field_name text, fill_count bigint, fill_percentage numeric) AS
$func$
BEGIN
RETURN QUERY EXECUTE (
-- RAISE NOTICE '%', ( -- to debug if needed
SELECT
'SELECT v.*
FROM (
SELECT count(*)::numeric AS ct
, ' || string_agg(format('count(%I) AS %I', fld, 'ct_' || fld), ', ') || '
FROM ' || tbl || '
) sub
, LATERAL (
VALUES
(text ' || string_agg(format('%L, %2$I, round(%2$I/ ct, 3))', fld, 'ct_' || fld), ', (') || '
) v(field_name, fill_count, fill_pct)
ORDER BY v.fill_count DESC'
FROM unnest(fields) fld
);
END
$func$ LANGUAGE plpgsql;
Call:
SELECT * FROM get_fill_rates('my_table', '{column_a, column_b, column_c}');
As you can see, this works for any given table and column list now.
And all identifiers are properly quoted automatically, using format() or by the built-in virtues of the regclass type.
Related:
Table name as a PostgreSQL function parameter
How to unpivot a table in PostgreSQL
Query for crosstab view
Convert one row into multiple rows with fewer columns
Your original query could be improved like this, but this is just lipstick on a pig. Do not use this inefficient approach.
CREATE OR REPLACE FUNCTION get_fill_rates()
RETURNS TABLE (field_name text, fill_count bigint, fill_percentage float) AS
$$
DECLARE
fields text[] := '{column_a, column_b, column_c}'; -- must be legal identifiers!
total_rows float; -- use float right away
BEGIN
SELECT reltuples INTO total_rows FROM pg_class WHERE relname = 'my_table';
FOREACH field_name IN ARRAY fields -- use FOREACH
LOOP
EXECUTE 'SELECT COUNT(*) FROM big WHERE ' || field_name || ' IS NOT NULL'
INTO fill_count;
fill_percentage := fill_count / total_rows; -- already type float
RETURN NEXT;
END LOOP;
END
$$ LANGUAGE plpgsql;
Plus, pg_class.reltuples is only an estimate. Since you are counting anyway, use an actual count.
Related:
Iterating over integer[] in PL/pgSQL
Fast way to discover the row count of a table in PostgreSQL
Given a table with a naming scheme as follows:
Example 1:
INFO_APPLICATION_B
CORRESPOND_U|OBJECTIVE_U|APPLICATION_U|DENIED_U|ACCEPTED_U|
Example 2:
INFO_CITIZEN_B
REFUGEE_U|INCOME_U|EDUCATION_U|CITIZEN_U|
I would like to filter out the column name in the table that is similar (as seen in the examples) to the table name. Precisely, in the first example, column number 3 would ideally be flagged due to its similarity with its respective table name. The same idea follows in example 2 where column number 4 would be flagged.
How can I go about doing this in SQL?
I want an output which does not display the columns whose name is similar to the table name:
CORRESPOND_U|OBJECTIVE_U||DENIED_U|ACCEPTED_U|
Notice how "APPLICATION_U" is no longer there because it was similar to the table name "APPLICATION_B".
Loop through _v_relation_column and determine if the column contains the text in the table.
I suspect you mean "similar" in some other way, but I'm just following your example.
create or replace find_similar_columns(varchar(any))
returns varchar
language nzplsql
begin_proc
declare
input_table alias for $1;
included_columns varchar;
delimiter varchar;
column record;
filtered_column varchar;
begin
delimiter := '|';
included_columns := '';
for column in
select
attname
from
_v_relation_column
where
name = input_table order by attnum
loop
filtered_column := replace(column.attname, '\_U', '');
if input_table not like '%' || filtered_column || '%' then
included_columns := included_columns || delimiter || column.attname;
end if;
end loop;
end;
end_proc;
How is it possible to send a query to all databases on a server? I do not want to input all databases names, the script should auto-detect them.
example query:
SELECT SUM(tourney_results.amt_won)-SUM((tourney_summary.amt_buyin+tourney_summary.amt_fee)) as results
FROM tourney_results
INNER JOIN tourney_summary
ON tourney_results.id_tourney=tourney_summary.id_tourney
Where id_player=(SELECT id_player FROM player WHERE player_name='Apple');
So what I want to achieve here, if there is 2 databases, the first one would result 60, the second one would result 50, I need the 55 output here.
All databeses would have the same structure, tables etc.
You can do it using plpgsql and db_link. First install the db_link extension in the database you are connecting to:
CREATE EXTENSION dblink;
Then use a plpgsql function which iterates over all database on the server and executes the query. See this example (see comments inline). Note that I used a sample query in the function. You have to adapt the function with your real query:
CREATE or REPLACE FUNCTION test_dblink() RETURNS BIGINT AS
$$
DECLARE pg_database_row record;
query_result BIGINT;
_dbname TEXT;
_conn_name TEXT;
return_value BIGINT;
BEGIN
--initialize the final value
return_value = 0;
--first iterate over the records in the meta table pg_database
FOR pg_database_row in SELECT * FROM pg_database WHERE (NOT datistemplate) AND (datallowconn) LOOP
_dbname = pg_database_row.datname;
--build a connection name for db_link
_conn_name=_dbname||'myconn';
--close the connection is already active:
IF array_contains(dblink_get_connections(),_conn_name) THEN
PERFORM dblink_disconnect(_conn_name);
END IF;
-- open the connection with the actual database name
PERFORM dblink_connect(_dbname||'myconn', 'dbname='||_dbname);
-- check if the table does exist in the database:
PERFORM * FROM dblink(_conn_name,'SELECT 1 from pg_tables where tablename = ''your_table''') AS t(id int) ;
IF FOUND THEN
-- if the table exist, perform the query and save the result in a variable
SELECT * FROM dblink(_conn_name,'SELECT sum(id) FROM your_table limit 1') AS t(total int) INTO query_result;
IF query_result IS NOT NULL THEN
return_value = return_value + query_result;
END IF;
END IF;
PERFORM dblink_disconnect(_conn_name);
END LOOP;
RETURN return_value;
END;
$$
LANGUAGE 'plpgsql';
Execute the function with
select test_dblink();
I am trying write function which open cursor with dynamic column name in it.
And I am concerned about obvious SQL injection possibility here.
I was happy to see in the fine manual that this can be easily done, but when I try it in my example, it goes wrong with
error: column does not exist.
My current attempt can be condensed into this SQL Fiddle. Below, I present formatted code for this fiddle.
The goal of tst() function is to be able to count distinct occurances of values in any given column of constant query.
I am asking for hint what am I doing wrong, or maybe some alternative way to achieve the same goal in a safe way.
CREATE TABLE t1 (
f1 character varying not null,
f2 character varying not null
);
CREATE TABLE t2 (
f1 character varying not null,
f2 character varying not null
);
INSERT INTO t1 (f1,f2) VALUES ('a1','b1'), ('a2','b2');
INSERT INTO t2 (f1,f2) VALUES ('a1','c1'), ('a2','c2');
CREATE OR REPLACE FUNCTION tst(p_field character varying)
RETURNS INTEGER AS
$BODY$
DECLARE
v_r record;
v_cur refcursor;
v_sql character varying := 'SELECT count(DISTINCT(%I)) as qty
FROM t1 LEFT JOIN t2 ON (t1.f1=t2.f1)';
BEGIN
OPEN v_cur FOR EXECUTE format(v_sql,lower(p_field));
FETCH v_cur INTO v_r;
CLOSE v_cur;
return v_r.qty;
END;
$BODY$
LANGUAGE plpgsql;
Test execution:
SELECT tst('t1.f1')
Provides error message:
ERROR: column "t1.f1" does not exist
Hint: PL/pgSQL function tst(character varying) line 1 at OPEN
This would work:
SELECT tst('f1');
The problem you are facing: format() interprets parameters concatenated with %I as one identifier. You are trying to pass a table-qualified column name that consists of two identifiers, which is interpreted as "t1.f1" (one name, double-quoted to preserve the otherwise illegal dot in the name.
If you want to pass table and column name, use two parameters:
CREATE OR REPLACE FUNCTION tst2(_col text, _tbl text = NULL)
RETURNS int AS
$func$
DECLARE
v_r record;
v_cur refcursor;
v_sql text := 'SELECT count(DISTINCT %s) AS qty
FROM t1 LEFT JOIN t2 USING (f1)';
BEGIN
OPEN v_cur FOR EXECUTE
format(v_sql, CASE WHEN _tbl <> '' -- rule out NULL and ''
THEN quote_ident(lower(_tbl)) || '.' ||
quote_ident(lower(_col))
ELSE quote_ident(lower(_col)) END);
FETCH v_cur INTO v_r;
CLOSE v_cur;
RETURN v_r.qty;
END
$func$ LANGUAGE plpgsql;
Aside: It's DISTINCT f1- no parentheses around the column name, unless you want to make it a row type.
Actually, you don't need a cursor for this at all. Faster, simpler:
CREATE OR REPLACE FUNCTION tst3(_col text, _tbl text = NULL, OUT ct bigint) AS
$func$
BEGIN
EXECUTE format('SELECT count(DISTINCT %s) AS qty
FROM t1 LEFT JOIN t2 USING (f1)'
, CASE WHEN _tbl <> '' -- rule out NULL and ''
THEN quote_ident(lower(_tbl)) || '.' ||
quote_ident(lower(_col))
ELSE quote_ident(lower(_col)) END)
INTO ct;
RETURN;
END
$func$ LANGUAGE plpgsql;
I provided NULL as parameter default for convenience. This way you can call the function with just a column name or with column and table name. But not without column name.
Call:
SELECT tst3('f1', 't1');
SELECT tst3('f1');
SELECT tst3(_col := 'f1');
Same as for test2().
SQL Fiddle.
Related answer:
Table name as a PostgreSQL function parameter