How to find maximum of each column using Postgres Sql Functions - sql

So I want MAX and MIN value from each column from table when execute function and input of that function is table.
for example-: Select * from public.f_count_MAX('table_name')
Any suggestions thanks in advance.
I have already created a function for finding null value which is below
CREATE OR REPLACE FUNCTION public.f_count_nulls(
_tbl regclass)
RETURNS TABLE(column_name text, missing_values bigint)
LANGUAGE 'plpgsql'
COST 100
STABLE PARALLEL SAFE
ROWS 1000
AS $BODY$
BEGIN
RETURN QUERY EXECUTE (
SELECT format(
$$
SELECT x.*
FROM (SELECT count(*) AS ct, %s FROM %s) t
CROSS JOIN LATERAL (VALUES %s) x(col, nulls)
ORDER BY nulls DESC, col DESC
$$, string_agg(format('count(%1$I) AS %1$I', attname), ', ')
, $1
, string_agg(format('(%1$L, ct - %1$I)', attname), ', ')
)
FROM pg_catalog.pg_attribute
WHERE attrelid = $1
AND attnum > 0
AND NOT attisdropped
-- more filters?
);
END
$BODY$;

Related

Apply function to all columns in a Postgres table dynamically

Using Postgres 13.1, I want to apply a forward fill function to all columns of a table. The forward fill function is explained in my earlier question:
How to do forward fill as a PL/PGSQL function
However, in that case the columns and table are specified. I want to take that code and apply it to an arbitrary table, ie. specify a table and the forward fill is applied to each of the columns.
Using this table as an example:
CREATE TABLE example(row_num int, id int, str text, val integer);
INSERT INTO example VALUES
(1, 1, '1a', NULL)
, (2, 1, NULL, 1)
, (3, 2, '2a', 2)
, (4, 2, NULL, NULL)
, (5, 3, NULL, NULL)
, (6, 3, '3a', 31)
, (7, 3, NULL, NULL)
, (8, 3, NULL, 32)
, (9, 3, '3b', NULL)
, (10,3, NULL, NULL)
;
I start with the following working base for the function. I call it passing in some variable names. Note the first is a table name not a column name. The function takes the table name and creates an array of all the column names and then outputs the names.
create or replace function col_collect(tbl text, id text, row_num text)
returns text[]
language plpgsql as
$func$
declare
tmp text[];
col text;
begin
select array (
select column_name
from information_schema."columns" c
where table_name = tbl
) into tmp;
foreach col in array tmp
loop
raise notice 'col: %', col;
end loop;
return tmp;
end
$func$;
I want to apply the "forward fill" function I got from my earlier question to each column of a table. UPDATE seems to be the correct approach. So this is the preceding function where I replace raise notice by an update using execute so I can pass in the table name:
create or replace function col_collect(tbl text, id text, row_num text)
returns void
language plpgsql as
$func$
declare
tmp text[];
col text;
begin
select array (
select column_name
from information_schema."columns" c
where table_name = tbl
) into tmp;
foreach col in array tmp
loop
execute 'update '||tbl||'
set '||col||' = gapfill('||col||') OVER w AS '||col||'
where '||tbl||'.row_num = '||col||'.row_num
window w as (PARTITION BY '||id||' ORDER BY '||row_num||')
returning *;';
end loop;
end
$func$;
-- call the function
select col_collect('example','id','row_num')
The preceding errors out with a syntax error. I have tried many variations on this but they all fail. Helpful answers on SO were here and here. The aggregate function I'm trying to apply (as window function) is:
CREATE OR REPLACE FUNCTION gap_fill_internal(s anyelement, v anyelement)
RETURNS anyelement
LANGUAGE plpgsql AS
$func$
BEGIN
RETURN COALESCE(v, s); -- that's all!
END
$func$;
CREATE AGGREGATE gap_fill(anyelement) (
SFUNC = gap_fill_internal,
STYPE = anyelement
);
My questions are:
is this a good approach and if so what am I doing wrong; or
is there a better way to do this?
What you ask is not a trivial task. You should be comfortable with PL/pgSQL. I do not advise this kind of dynamic SQL queries for beginners, too powerful.
That said, let's dive in. Buckle up!
CREATE OR REPLACE FUNCTION f_gap_fill_update(_tbl regclass, _id text, _row_num text, OUT nullable_columns int, OUT updated_rows int)
LANGUAGE plpgsql AS
$func$
DECLARE
_pk text := quote_ident(_row_num);
_sql text;
BEGIN
SELECT INTO _sql, nullable_columns
concat_ws(E'\n'
, 'UPDATE ' || _tbl || ' t'
, 'SET (' || string_agg( quote_ident(a.attname), ', ') || ')'
, ' = (' || string_agg('u.' || quote_ident(a.attname), ', ') || ')'
, 'FROM ('
, ' SELECT ' || _pk
, ' , ' || string_agg(format('gap_fill(%1$I) OVER w AS %1$I', a.attname), ', ')
, ' FROM ' || _tbl
, format(' WINDOW w AS (PARTITION BY %I ORDER BY %s)', _id, _pk)
, ' ) u'
, format('WHERE t.%1$s = u.%1$s', _pk)
, 'AND (' || string_agg('t.' || quote_ident(a.attname), ', ') || ') IS DISTINCT FROM'
, ' (' || string_agg('u.' || quote_ident(a.attname), ', ') || ')'
)
, count(*) -- AS _col_ct
FROM (
SELECT a.attname
FROM pg_attribute a
WHERE a.attrelid = _tbl
AND a.attnum > 0
AND NOT a.attisdropped
AND NOT a.attnotnull
ORDER BY a.attnum
) a;
IF nullable_columns = 0 THEN
RAISE EXCEPTION 'No nullable columns found in table >>%<<', _tbl;
ELSIF _sql IS NULL THEN
RAISE EXCEPTION 'SQL string is NULL. Should not occur!';
END IF;
-- RAISE NOTICE '%', _sql; -- debug
EXECUTE _sql; -- execute
GET DIAGNOSTICS updated_rows = ROW_COUNT;
END
$func$;
Example call:
SELECT * FROM f_gap_fill_update('example', 'id', 'row_num');
db<>fiddle here
The function is state of the art.
Generates and executes a query of the form:
UPDATE tbl t
SET (str, val, col1)
= (u.str, u.val, u.col1)
FROM (
SELECT row_num
, gap_fill(str) OVER w AS str, gap_fill(val) OVER w AS val
, gap_fill(col1) OVER w AS col1
FROM tbl
WINDOW w AS (PARTITION BY id ORDER BY row_num)
) u
WHERE t.row_num = u.row_num
AND (t.str, t.val, t.col1) IS DISTINCT FROM
(u.str, u.val, u.col1)
Using pg_catalog.pg_attribute instead of the information schema. See:
"Information schema vs. system catalogs"
Note the final WHERE clause to prevent (possibly expensive) empty updates. Only rows that actually change will be written. See:
How do I (or can I) SELECT DISTINCT on multiple columns?
Moreover, only nullable columns (not defined NOT NULL) will even be considered, to avoid unnecessary work.
Using ROW syntax in UPDATE to keep the code simple. See:
SQL update fields of one table from fields of another one
The function returns two integer values: nullable_columns and updated_rows, reporting what the names suggest.
The function defends against SQL injection properly. See:
Table name as a PostgreSQL function parameter
SQL injection in Postgres functions vs prepared queries
About GET DIAGNOSTICS:
Calculate number of rows affected by batch query in PostgreSQL
The above function updates, but does not return rows. Here is a basic demo how to return rows of varying type:
CREATE OR REPLACE FUNCTION f_gap_fill_select(_tbl_type anyelement, _id text, _row_num text)
RETURNS SETOF anyelement
LANGUAGE plpgsql AS
$func$
DECLARE
_tbl regclass := pg_typeof(_tbl_type)::text::regclass;
_sql text;
BEGIN
SELECT INTO _sql
'SELECT ' || string_agg(CASE WHEN a.attnotnull
THEN format('%I', a.attname)
ELSE format('gap_fill(%1$I) OVER w AS %1$I', a.attname) END
, ', ' ORDER BY a.attnum)
|| E'\nFROM ' || _tbl
|| format(E'\nWINDOW w AS (PARTITION BY %I ORDER BY %I)', _id, _row_num)
FROM pg_attribute a
WHERE a.attrelid = _tbl
AND a.attnum > 0
AND NOT a.attisdropped;
IF _sql IS NULL THEN
RAISE EXCEPTION 'SQL string is NULL. Should not occur!';
END IF;
RETURN QUERY EXECUTE _sql;
-- RAISE NOTICE '%', _sql; -- debug
END
$func$;
Call (note special syntax!):
SELECT * FROM f_gap_fill_select(NULL::example, 'id', 'row_num');
db<>fiddle here
About returning a polymorphic row type:
Refactor a PL/pgSQL function to return the output of various SELECT queries

why does this function not return the excpected tsrange[]

I have these table with this dataset
CREATE TABLE employee (entity_id uuid, valid tsrange);
INSERT into employee (entity_id, valid)
VALUES
('0006f79d-5af7-4b29-a200-6aef3bb0105f', tsrange('2009-05-23 02:00:00','2010-08-27 02:00:00')),
('0006f79d-5af7-4b29-a200-6aef3bb0105f', tsrange('2010-08-27 02:00:00','2010-10-27 02:00:00')),
('0006f79d-5af7-4b29-a200-6aef3bb0105f', tsrange('2011-05-23 02:00:00','infinity'))
for which I want to select an each continous aggregated tsrange, as an tsrange[], as for a specific entity_id.
For this I have this function
CREATE OR REPLACE FUNCTION tsrange_cluster_aggregate(_tbl regclass, selected_entity_id uuid)
RETURNS SETOF TSRANGE AS
$$
BEGIN
EXECUTE
FORMAT(
'SELECT tsrange(min(COALESCE(lower(valid), ''-infinity'')), max(COALESCE(upper(valid), ''infinity'')))
FROM (
SELECT *, count(nextstart > enddate OR NULL) OVER (ORDER BY valid DESC NULLS LAST) AS grp
FROM (
SELECT valid
, max(COALESCE(upper(valid), ''infinity'')) OVER (ORDER BY valid) AS enddate
, lead(lower(valid)) OVER (ORDER BY valid) As nextstart
FROM %1$I
where entity_id = %2$L
) a
) b
GROUP BY grp
ORDER BY 1;'
, _tbl , selected_entity_id)
USING _tbl, selected_entity_id;
END
$$ LANGUAGE plpgsql;
SELECT tsrange_cluster_aggregate('employee'::regclass, '0006f79d-5af7-4b29-a200-6aef3bb0105f');
Problem here is that output of this function is always empty?
I get the expected output when I run it outside an function?
https://dbfiddle.uk/?rdbms=postgres_12&fiddle=a09f8033091ea56e533880feebd0d5d4
You are missing a RETURN in your function:
CREATE OR REPLACE FUNCTION tsrange_cluster_aggregate(_tbl regclass, selected_entity_id uuid)
RETURNS SETOF tsrange[] AS
$$
BEGIN
RETURN QUERY EXECUTE
FORMAT(
' SELECT array_agg(j) FROM (
SELECT tsrange(min(COALESCE(lower(valid), ''-infinity'')), max(COALESCE(upper(valid), ''infinity'')))
FROM (
SELECT *, count(nextstart > enddate OR NULL) OVER (ORDER BY valid DESC NULLS LAST) AS grp
FROM (
SELECT valid
, max(COALESCE(upper(valid), ''infinity'')) OVER (ORDER BY valid) AS enddate
, lead(lower(valid)) OVER (ORDER BY valid) As nextstart
FROM %1$I
where entity_id = %2$L
) a
) b
GROUP BY grp
ORDER BY 1) z (j);'
, _tbl , selected_entity_id)
USING _tbl, selected_entity_id;
END
$$ LANGUAGE plpgsql;
Demo: db<>fiddle

Count the number of null values into an Oracle table?

I need to count the number of null values of all the columns in a table in Oracle.
For instance, I execute the following statements to create a table TEST and insert data.
CREATE TABLE TEST
( A VARCHAR2(20 BYTE),
B VARCHAR2(20 BYTE),
C VARCHAR2(20 BYTE)
);
Insert into TEST (A) values ('a');
Insert into TEST (B) values ('b');
Insert into TEST (C) values ('c');
Now, I write the following code to compute the number of null values in the table TEST:
declare
cnt number :=0;
temp number :=0;
begin
for r in ( select column_name, data_type
from user_tab_columns
where table_name = upper('test')
order by column_id )
loop
if r.data_type <> 'NOT NULL' then
select count(*) into temp FROM TEST where r.column_name IS NULL;
cnt := cnt + temp;
END IF;
end loop;
dbms_output.put_line('Total: '||cnt);
end;
/
It returns 0, when the expected value is 6.
Where is the error?
Thanks in advance.
Counting NULLs for each column
In order to count NULL values for all columns of a table T you could run
SELECT COUNT(*) - COUNT(col1) col1_nulls
, COUNT(*) - COUNT(col2) col2_nulls
,..
, COUNT(*) - COUNT(colN) colN_nulls
, COUNT(*) total_rows
FROM T
/
Where col1, col2, .., colN should be replaced with actual names of columns of T table.
Aggregate functions -like COUNT()- ignore NULL values, so COUNT(*) - COUNT(col) will give you how many nulls for each column.
Summarize all NULLs of a table
If you want to know how many fields are NULL, I mean every NULL of every record you can
WITH d as (
SELECT COUNT(*) - COUNT(col1) col1_nulls
, COUNT(*) - COUNT(col2) col2_nulls
,..
, COUNT(*) - COUNT(colN) colN_nulls
, COUNT(*) total_rows
FROM T
) SELECT col1_nulls + col1_nulls +..+ colN_null
FROM d
/
Summarize all NULLs of a table (using Oracle dictionary tables)
Following is an improvement in which you need to now nothing but table name and it is very easy to code a function based on it
DECLARE
T VARCHAR2(64) := '<YOUR TABLE NAME>';
expr VARCHAR2(32767);
q INTEGER;
BEGIN
SELECT 'SELECT /*+FULL(T) PARALLEL(T)*/' || COUNT(*) || ' * COUNT(*) OVER () - ' || LISTAGG('COUNT(' || COLUMN_NAME || ')', ' + ') WITHIN GROUP (ORDER BY COLUMN_ID) || ' FROM ' || T
INTO expr
FROM USER_TAB_COLUMNS
WHERE TABLE_NAME = T;
-- This line is for debugging purposes only
DBMS_OUTPUT.PUT_LINE(expr);
EXECUTE IMMEDIATE expr INTO q;
DBMS_OUTPUT.PUT_LINE(q);
END;
/
Due to calculation implies a full table scan, code produced in expr variable was optimized for parallel running.
User defined function null_fields
Function version, also includes an optional parameter to be able to run on other schemas.
CREATE OR REPLACE FUNCTION null_fields(table_name IN VARCHAR2, owner IN VARCHAR2 DEFAULT USER)
RETURN INTEGER IS
T VARCHAR2(64) := UPPER(table_name);
o VARCHAR2(64) := UPPER(owner);
expr VARCHAR2(32767);
q INTEGER;
BEGIN
SELECT 'SELECT /*+FULL(T) PARALLEL(T)*/' || COUNT(*) || ' * COUNT(*) OVER () - ' || listagg('COUNT(' || column_name || ')', ' + ') WITHIN GROUP (ORDER BY column_id) || ' FROM ' || o || '.' || T || ' t'
INTO expr
FROM all_tab_columns
WHERE table_name = T;
EXECUTE IMMEDIATE expr INTO q;
RETURN q;
END;
/
-- Usage 1
SELECT null_fields('<your table name>') FROM dual
/
-- Usage 2
SELECT null_fields('<your table name>', '<table owner>') FROM dual
/
Thank you #Lord Peter :
The below PL/SQL script works
declare
cnt number :=0;
temp number :=0;
begin
for r in ( select column_name, nullable
from user_tab_columns
where table_name = upper('test')
order by column_id )
loop
if r.nullable = 'Y' then
EXECUTE IMMEDIATE 'SELECT count(*) FROM test where '|| r.column_name ||' IS NULL' into temp ;
cnt := cnt + temp;
END IF;
end loop;
dbms_output.put_line('Total: '||cnt);
end;
/
The table name test may be replaced the name of table of your interest.
I hope this solution is useful!
The dynamic SQL you execute (this is the string used in EXECUTE IMMEDIATE) should be
select sum(
decode(a,null,1,0)
+decode(b,null,1,0)
+decode(c,null,1,0)
) nullcols
from test;
Where each summand corresponds to a NOT NULL column.
Here only one table scan is necessary to get the result.
Use the data dictionary to find the number of NULL values almost instantly:
select sum(num_nulls) sum_num_nulls
from all_tab_columns
where owner = user
and table_name = 'TEST';
SUM_NUM_NULLS
-------------
6
The values will only be correct if optimizer statistics were gathered recently and if they were gathered with the default value for the sample size.
Those may seem like large caveats but it's worth becoming familiar with your database's statistics gathering process anyway. If your database is not automatically gathering statistics or if your database is not using the default sample size those are likely huge problems you need to be aware of.
To manually gather stats for a specific table a statement like this will work:
begin
dbms_stats.gather_table_stats(user, 'TEST');
end;
/
select COUNT(1) TOTAL from table where COLUMN is NULL;

To get column name in the result in postgresql

In the below query I am getting the values in a array format and I want to get column name too.
CREATE OR REPLACE FUNCTION load_page(IN _session integer)
RETURNS TABLE(col1 text, col2 text) AS
$BODY$
BEGIN
RETURN QUERY
select
(SELECT array_agg(sq.*)
FROM (SELECT user_id, user_name
FROM "user"
) sq
)::text,
(SELECT array_agg(sq.*)
FROM (SELECT client_id, client_name ,client_desc
FROM "clients"
) sq
)::text;
END;
$BODY$ LANGUAGE plpgsql STABLE;
The Result is :
"("{""(2,Test)"",""(5,Santhosh)"",""(3,Test1)""}","{""(1,Test1,Test1)"",""(2,test2,test2)"",""(3,test3,test3)""}")"
You could simply type them and concat into one column to use it as a row in outer select:
CREATE OR REPLACE FUNCTION load_page(IN _session integer)
RETURNS TABLE(col1 text, col2 text) AS
$BODY$
BEGIN
RETURN QUERY
select
(SELECT array_agg(sq.*)
FROM (SELECT concat('user_id: ',user_id,', user_name: ', user_name)
FROM "user"
) sq
)::text,
(SELECT array_agg(sq.*)
FROM (SELECT concat('client_id: ',client_id,', client_name: ',client_name,', client_desc: ',client_desc)
FROM "clients"
) sq
)::text;
END;
$BODY$ LANGUAGE plpgsql STABLE;
If you want to discard information about particular column if it's value is NULL you can append column names conditionally (with a CASE statement).
The below function does not produce the curly braces or the flood of double quotes, but it does include the column names and it is quite a bit more concise and readable and efficient than your initial version.
The _session parameter is never used, but I shall assume that this is all part of some larger query/function.
CREATE OR REPLACE FUNCTION load_page(IN _session integer, OUT col1 text, OUT col2 text)
RETURNS SETOF record AS $BODY$
BEGIN
SELECT string_agg('user_id: ' || user_id ||
', user_name: ' || user_name, ',')
INTO col1
FROM "user";
SELECT string_agg('client_id: ' || client_id ||
', client_name: ' || client_name ||
', client_desc: ' || client_desc, ',')
INTO col2
FROM "clients";
RETURN NEXT;
END;
$BODY$ LANGUAGE plpgsql STABLE;

Order by in subquery (for jQuery jTable) doesn't work?

Some background:
My framework jQuery jTable, allows me to do pagination and sort columns, in my select query I need to retrieve n rows (from nth, to nth) and previously order the data by the selected column.
I have a table with n columns where would not exist some rows (this is an example):
To achieve the first requirement I wrote the follow procedure:
create or replace
PROCEDURE PR_SHOWVALUESOLD
(
PRMROWMIN IN NUMBER
, PRMROWMAX IN NUMBER
, CURSORRESULT OUT SYS_REFCURSOR
) AS
BEGIN
open CURSORRESULT for
select * from
(select v.*, rownum r,
(
select count(*) TOTALITEMS from TABLE1 v
) TOTALITEMS
from TABLE1 v
) d
where d.r >= PRMROWMIN and d.r <= PRMROWMAX;
END PR_SHOWVALUESOLD;
This work successfully, I execute the procedure with the follows parameters (PRMROWMIN = 6, PRMROWMAX = 9), the result of the procedure are in Output Varibles window.
Now comes the next step, I need to order the data before take from n to x row.
I rewrite the procedure to do this, but doesn't work:
CREATE OR REPLACE PROCEDURE PR_SHOWVALUES
(
PRMROWMIN IN NUMBER
, PRMROWMAX IN NUMBER
, PRMORDERCOL IN VARCHAR2
, PRMORDERDIR IN VARCHAR2
, CURSORRESULT OUT SYS_REFCURSOR
) AS
BEGIN
open CURSORRESULT for
select * from
(select v.*, rownum r,
(
select count(*) TOTALITEMS from TABLE1 v
) TOTALITEMS
from TABLE1 v
order by 'LOWER(' || PRMORDERCOL || ')' || ' ' || PRMORDERDIR
) d
where d.r >= PRMROWMIN and d.r <= PRMROWMAX;
END PR_SHOWVALUES;
I executed the modified procedure with the follows parameters:
PRMROWMIN := 6;
PRMROWMAX := 9;
PRMORDERCOL := 'COLUMNA';
PRMORDERDIR := 'DESC';
I expected the highlighted rows Query Result 2 window (but this new procedure retrieve the same data as old but disordered Output Variables Window):
How to achieve my requirements?
Thanks in advance.
This is your order by:
order by 'LOWER(' || PRMORDERCOL || ')' || ' ' || PRMORDERDIR
It is not applying the function lower(). Instead, it is concatenating the strings. You may mean:
order by LOWER(PRMORDERCOL) ' ' || PRMORDERDIR
I found a solution to my requirement.
I need to use DECODE to match every column to sort.
I can order in subquery, in this case I do two order by in two subqueries.
The documentation of PL/SQL DECODE function are in:
PL/SQL DECODE FUNCTION
The final Procedure are:
CREATE OR REPLACE PROCEDURE PR_SHOWVALUES
(
PRMROWMIN IN NUMBER
, PRMROWMAX IN NUMBER
, PRMORDERCOL IN VARCHAR2
, PRMORDERDIR IN VARCHAR2
, CURSORRESULT OUT SYS_REFCURSOR
) AS
BEGIN
open CURSORRESULT for
select * from (
select rownum r, v.* from
(
select * from
(
select * from table1 tbl
order by decode
(
UPPER(PRMORDERCOL),
'COLUMNA', LOWER(tbl.COLUMNA),
'COLUMNB', LOWER(tbl.COLUMNB),
LOWER(tbl.TABLE1_ID)
)
)
ORDER BY
CASE
WHEN UPPER(PRMORDERDIR) = 'DESC' THEN
ROWNUM * -1
ELSE
ROWNUM
END
) v
)
where r >= PRMROWMIN and r <= PRMROWMAX;
END PR_SHOWVALUES;
Acknowledgment to Jack David Baucum where I found the solution.