Using Postgres 13.1, I want to apply a forward fill function to all columns of a table. The forward fill function is explained in my earlier question:
How to do forward fill as a PL/PGSQL function
However, in that case the columns and table are specified. I want to take that code and apply it to an arbitrary table, ie. specify a table and the forward fill is applied to each of the columns.
Using this table as an example:
CREATE TABLE example(row_num int, id int, str text, val integer);
INSERT INTO example VALUES
(1, 1, '1a', NULL)
, (2, 1, NULL, 1)
, (3, 2, '2a', 2)
, (4, 2, NULL, NULL)
, (5, 3, NULL, NULL)
, (6, 3, '3a', 31)
, (7, 3, NULL, NULL)
, (8, 3, NULL, 32)
, (9, 3, '3b', NULL)
, (10,3, NULL, NULL)
;
I start with the following working base for the function. I call it passing in some variable names. Note the first is a table name not a column name. The function takes the table name and creates an array of all the column names and then outputs the names.
create or replace function col_collect(tbl text, id text, row_num text)
returns text[]
language plpgsql as
$func$
declare
tmp text[];
col text;
begin
select array (
select column_name
from information_schema."columns" c
where table_name = tbl
) into tmp;
foreach col in array tmp
loop
raise notice 'col: %', col;
end loop;
return tmp;
end
$func$;
I want to apply the "forward fill" function I got from my earlier question to each column of a table. UPDATE seems to be the correct approach. So this is the preceding function where I replace raise notice by an update using execute so I can pass in the table name:
create or replace function col_collect(tbl text, id text, row_num text)
returns void
language plpgsql as
$func$
declare
tmp text[];
col text;
begin
select array (
select column_name
from information_schema."columns" c
where table_name = tbl
) into tmp;
foreach col in array tmp
loop
execute 'update '||tbl||'
set '||col||' = gapfill('||col||') OVER w AS '||col||'
where '||tbl||'.row_num = '||col||'.row_num
window w as (PARTITION BY '||id||' ORDER BY '||row_num||')
returning *;';
end loop;
end
$func$;
-- call the function
select col_collect('example','id','row_num')
The preceding errors out with a syntax error. I have tried many variations on this but they all fail. Helpful answers on SO were here and here. The aggregate function I'm trying to apply (as window function) is:
CREATE OR REPLACE FUNCTION gap_fill_internal(s anyelement, v anyelement)
RETURNS anyelement
LANGUAGE plpgsql AS
$func$
BEGIN
RETURN COALESCE(v, s); -- that's all!
END
$func$;
CREATE AGGREGATE gap_fill(anyelement) (
SFUNC = gap_fill_internal,
STYPE = anyelement
);
My questions are:
is this a good approach and if so what am I doing wrong; or
is there a better way to do this?
What you ask is not a trivial task. You should be comfortable with PL/pgSQL. I do not advise this kind of dynamic SQL queries for beginners, too powerful.
That said, let's dive in. Buckle up!
CREATE OR REPLACE FUNCTION f_gap_fill_update(_tbl regclass, _id text, _row_num text, OUT nullable_columns int, OUT updated_rows int)
LANGUAGE plpgsql AS
$func$
DECLARE
_pk text := quote_ident(_row_num);
_sql text;
BEGIN
SELECT INTO _sql, nullable_columns
concat_ws(E'\n'
, 'UPDATE ' || _tbl || ' t'
, 'SET (' || string_agg( quote_ident(a.attname), ', ') || ')'
, ' = (' || string_agg('u.' || quote_ident(a.attname), ', ') || ')'
, 'FROM ('
, ' SELECT ' || _pk
, ' , ' || string_agg(format('gap_fill(%1$I) OVER w AS %1$I', a.attname), ', ')
, ' FROM ' || _tbl
, format(' WINDOW w AS (PARTITION BY %I ORDER BY %s)', _id, _pk)
, ' ) u'
, format('WHERE t.%1$s = u.%1$s', _pk)
, 'AND (' || string_agg('t.' || quote_ident(a.attname), ', ') || ') IS DISTINCT FROM'
, ' (' || string_agg('u.' || quote_ident(a.attname), ', ') || ')'
)
, count(*) -- AS _col_ct
FROM (
SELECT a.attname
FROM pg_attribute a
WHERE a.attrelid = _tbl
AND a.attnum > 0
AND NOT a.attisdropped
AND NOT a.attnotnull
ORDER BY a.attnum
) a;
IF nullable_columns = 0 THEN
RAISE EXCEPTION 'No nullable columns found in table >>%<<', _tbl;
ELSIF _sql IS NULL THEN
RAISE EXCEPTION 'SQL string is NULL. Should not occur!';
END IF;
-- RAISE NOTICE '%', _sql; -- debug
EXECUTE _sql; -- execute
GET DIAGNOSTICS updated_rows = ROW_COUNT;
END
$func$;
Example call:
SELECT * FROM f_gap_fill_update('example', 'id', 'row_num');
db<>fiddle here
The function is state of the art.
Generates and executes a query of the form:
UPDATE tbl t
SET (str, val, col1)
= (u.str, u.val, u.col1)
FROM (
SELECT row_num
, gap_fill(str) OVER w AS str, gap_fill(val) OVER w AS val
, gap_fill(col1) OVER w AS col1
FROM tbl
WINDOW w AS (PARTITION BY id ORDER BY row_num)
) u
WHERE t.row_num = u.row_num
AND (t.str, t.val, t.col1) IS DISTINCT FROM
(u.str, u.val, u.col1)
Using pg_catalog.pg_attribute instead of the information schema. See:
"Information schema vs. system catalogs"
Note the final WHERE clause to prevent (possibly expensive) empty updates. Only rows that actually change will be written. See:
How do I (or can I) SELECT DISTINCT on multiple columns?
Moreover, only nullable columns (not defined NOT NULL) will even be considered, to avoid unnecessary work.
Using ROW syntax in UPDATE to keep the code simple. See:
SQL update fields of one table from fields of another one
The function returns two integer values: nullable_columns and updated_rows, reporting what the names suggest.
The function defends against SQL injection properly. See:
Table name as a PostgreSQL function parameter
SQL injection in Postgres functions vs prepared queries
About GET DIAGNOSTICS:
Calculate number of rows affected by batch query in PostgreSQL
The above function updates, but does not return rows. Here is a basic demo how to return rows of varying type:
CREATE OR REPLACE FUNCTION f_gap_fill_select(_tbl_type anyelement, _id text, _row_num text)
RETURNS SETOF anyelement
LANGUAGE plpgsql AS
$func$
DECLARE
_tbl regclass := pg_typeof(_tbl_type)::text::regclass;
_sql text;
BEGIN
SELECT INTO _sql
'SELECT ' || string_agg(CASE WHEN a.attnotnull
THEN format('%I', a.attname)
ELSE format('gap_fill(%1$I) OVER w AS %1$I', a.attname) END
, ', ' ORDER BY a.attnum)
|| E'\nFROM ' || _tbl
|| format(E'\nWINDOW w AS (PARTITION BY %I ORDER BY %I)', _id, _row_num)
FROM pg_attribute a
WHERE a.attrelid = _tbl
AND a.attnum > 0
AND NOT a.attisdropped;
IF _sql IS NULL THEN
RAISE EXCEPTION 'SQL string is NULL. Should not occur!';
END IF;
RETURN QUERY EXECUTE _sql;
-- RAISE NOTICE '%', _sql; -- debug
END
$func$;
Call (note special syntax!):
SELECT * FROM f_gap_fill_select(NULL::example, 'id', 'row_num');
db<>fiddle here
About returning a polymorphic row type:
Refactor a PL/pgSQL function to return the output of various SELECT queries
Related
Morning,
I'm trying to write a script that will convert Unload tables (UNLD to HDL files) creating a flat file using PLSQL. I keep getting syntax errors trying to run it and would appreciate some help from an expert out there!
Here are the errors:
Error(53,21): PLS-00330: invalid use of type name or subtype name
Error(57,32): PLS-00222: no function with name 'UNLDTABLE' exists in this scope
Our guess is that the unldTable variable is being treated as a String, rather than a database table object (Not really expereinced in PLSQL)
CREATE OR REPLACE PROCEDURE UNLD_TO_HDL (processComponent IN VARCHAR2)
IS
fHandle UTL_FILE.FILE_TYPE;
concatData VARCHAR2(240);
concatHDLMetaTags VARCHAR2(240);
outputFileName VARCHAR2(240);
TYPE rowArrayType IS TABLE OF VARCHAR2(240);
rowArray rowArrayType;
emptyArray rowArrayType;
valExtractArray rowArrayType;
hdlFileName VARCHAR2(240);
unldTable VARCHAR2(240);
countUNLDRows Number;
dataType VARCHAR2(240);
current_table VARCHAR2(30);
value_to_char VARCHAR2(240);
BEGIN
SELECT HDL_FILE_NAME
INTO hdlFileName
FROM GNC_HDL_CREATION_PARAMS
WHERE PROCESS_COMPONENT = processComponent;
SELECT UNLD_TABLE
INTO unldTable
FROM GNC_HDL_CREATION_PARAMS
WHERE PROCESS_COMPONENT = processComponent
FETCH NEXT 1 ROWS ONLY;
SELECT LISTAGG(HDL_META_TAG,'|')
WITHIN GROUP(ORDER BY HDL_META_TAG)
INTO concatHDLMetaTags
FROM GNC_MIG_CONTROL
WHERE HDL_COMP = processComponent;
SELECT DB_FIELD
BULK COLLECT INTO valExtractArray
FROM GNC_MIG_CONTROL
WHERE HDL_COMP = processComponent
ORDER BY HDL_META_TAG;
fHandle := UTL_FILE.FOPEN('./', hdlFileName, 'W');
UTL_FILE.PUTF(fHandle, concatHDLMetaTags + '\n');
SELECT num_rows INTO countUNLDRows FROM user_tables where table_name = unldTable;
FOR row in 1..countUNLDRows LOOP
rowArray := emptyArrayType;
FOR value in 1..valExtractArray.COUNT LOOP
rowArray.extend();
SELECT data_type INTO dataType FROM all_tab_columns where table_name = unldTable AND column_name = valExtractArray(value);
IF dataType = 'VARCHAR2' THEN (SELECT valExtractArray(value) INTO value_to_char FROM current_table WHERE ROWNUM = row);
ELSIF dataType = 'DATE' THEN (SELECT TO_CHAR(valExtractArray(value),'YYYY/MM/DD') INTO value_to_char FROM current_table WHERE ROWNUM = row);
ELSIF dataType = 'NUMBER' THEN (SELECT TO_CHAR(valExtractArray(value)) INTO value_to_char FROM current_table WHERE ROWNUM = row);
ENDIF;
rowArray(value) := value_to_char;
END LOOP;
concatData := NULL;
FOR item in 1..rowArray.COUNT LOOP
IF item = rowArray.COUNT
THEN concatData := (COALESCE(concatData,'') || rowArray(item));
ELSE concatData := (COALESCE(concatData,'') || rowArray(item) || '|');
END IF;
END LOOP;
UTL_FILE.PUTF(fHandle, concatData + '/n');
END LOOP;
UTL_FILE.FCLOSE(fHandle);
END;
Thanks,
Adam
I believe it is just an overlook in your code. You define unldTable as a varchar, which is used correctly until you try to access it as if it were a varray on line 51
rowArray(value) := unldTable(row).valExtractArray(value);
Given that you have not defined it as a varray, unldTable(row) is making the interpreter believe that you are referring to a function.
EDIT
Now that you have moved on, you should resolve the problem of invoking SELECT statements on tables that are unknown at runtime. To do so you need to make use of Dynamic SQL; you can do it in several way, the most direct being an Execute immediate statement in your case:
mystatement := 'SELECT valExtractArray(value) INTO :value_to_char FROM ' || current_table || ' WHERE ROWNUM = ' || row;
execute immediate mystatement USING OUT value_to_char;
It looks like you need to generate a cursor as
select [list of columns from GNC_MIG_CONTROL.DB_FIELD]
from [table name from GNC_HDL_CREATION_PARAMS.UNLD_TABLE]
Assuming setup like this:
create table my_table (business_date date, id integer, dummy1 varchar2(1), dummy2 varchar2(20));
create table gnc_hdl_creation_params (unld_table varchar2(30), process_component varchar2(30));
create table gnc_mig_control (db_field varchar2(30), hdl_comp varchar2(30), hdl_meta_tag integer);
insert into my_table(business_date, id, dummy1, dummy2) values (date '2018-01-01', 123, 'X','Some more text');
insert into gnc_hdl_creation_params (unld_table, process_component) values ('MY_TABLE', 'XYZ');
insert into gnc_mig_control (db_field, hdl_comp, hdl_meta_tag) values ('BUSINESS_DATE', 'XYZ', '1');
insert into gnc_mig_control (db_field, hdl_comp, hdl_meta_tag) values ('ID', 'XYZ', '2');
insert into gnc_mig_control (db_field, hdl_comp, hdl_meta_tag) values ('DUMMY1', 'XYZ', '3');
insert into gnc_mig_control (db_field, hdl_comp, hdl_meta_tag) values ('DUMMY2', 'XYZ', '4');
You could build a query like this:
select unld_table, listagg(expr, q'[||'|'||]') within group (order by hdl_meta_tag) as expr_list
from ( select t.unld_table
, case tc.data_type
when 'DATE' then 'to_char('||c.db_field||',''YYYY-MM-DD'')'
else c.db_field
end as expr
, c.hdl_meta_tag
from gnc_hdl_creation_params t
join gnc_mig_control c
on c.hdl_comp = t.process_component
left join user_tab_columns tc
on tc.table_name = t.unld_table
and tc.column_name = c.db_field
where t.process_component = 'XYZ'
)
group by unld_table;
Output:
UNLD_TABLE EXPR_LIST
----------- --------------------------------------------------------------------------------
MY_TABLE to_char(BUSINESS_DATE,'YYYY-MM-DD')||'|'||ID||'|'||DUMMY1||'|'||DUMMY2
Now if you plug that logic into a PL/SQL procedure you could have something like this:
declare
processComponent constant gnc_hdl_creation_params.process_component%type := 'XYZ';
unloadSQL long;
unloadCur sys_refcursor;
text long;
begin
select 'select ' || listagg(expr, q'[||'|'||]') within group (order by hdl_meta_tag) || ' as text from ' || unld_table
into unloadSQL
from ( select t.unld_table
, case tc.data_type
when 'DATE' then 'to_char('||c.db_field||',''YYYY/MM/DD'')'
else c.db_field
end as expr
, c.hdl_meta_tag
from gnc_hdl_creation_params t
join gnc_mig_control c
on c.hdl_comp = t.process_component
left join user_tab_columns tc
on tc.table_name = t.unld_table
and tc.column_name = c.db_field
where t.process_component = processComponent
)
group by unld_table;
open unloadCur for unloadSQL;
loop
fetch unloadCur into text;
dbms_output.put_line(text);
exit when unloadCur%notfound;
end loop;
close unloadCur;
end;
Output:
2018/01/01|123|X|Some more text
2018/01/01|123|X|Some more text
Now you just have to make that into a procedure, change dbms_output to utl_file and add your meta tags etc and you're there.
I've assumed there is only one distinct unld_table per process component. If there are more you'll need a loop to work through each one.
For a slightly more generic approach, you could build a cursor-to-csv generator which could encapsulate the datatype handling, and then you'd only need to build the SQL as select [columns] from [table]. You might then write a generic cursor to file processor, where you pass in the filename and a cursor and it does the lot.
Edit: I've updated my cursor-to-csv generator to provide file output, so you just need to pass it a cursor and the file details.
I need to count the number of null values of all the columns in a table in Oracle.
For instance, I execute the following statements to create a table TEST and insert data.
CREATE TABLE TEST
( A VARCHAR2(20 BYTE),
B VARCHAR2(20 BYTE),
C VARCHAR2(20 BYTE)
);
Insert into TEST (A) values ('a');
Insert into TEST (B) values ('b');
Insert into TEST (C) values ('c');
Now, I write the following code to compute the number of null values in the table TEST:
declare
cnt number :=0;
temp number :=0;
begin
for r in ( select column_name, data_type
from user_tab_columns
where table_name = upper('test')
order by column_id )
loop
if r.data_type <> 'NOT NULL' then
select count(*) into temp FROM TEST where r.column_name IS NULL;
cnt := cnt + temp;
END IF;
end loop;
dbms_output.put_line('Total: '||cnt);
end;
/
It returns 0, when the expected value is 6.
Where is the error?
Thanks in advance.
Counting NULLs for each column
In order to count NULL values for all columns of a table T you could run
SELECT COUNT(*) - COUNT(col1) col1_nulls
, COUNT(*) - COUNT(col2) col2_nulls
,..
, COUNT(*) - COUNT(colN) colN_nulls
, COUNT(*) total_rows
FROM T
/
Where col1, col2, .., colN should be replaced with actual names of columns of T table.
Aggregate functions -like COUNT()- ignore NULL values, so COUNT(*) - COUNT(col) will give you how many nulls for each column.
Summarize all NULLs of a table
If you want to know how many fields are NULL, I mean every NULL of every record you can
WITH d as (
SELECT COUNT(*) - COUNT(col1) col1_nulls
, COUNT(*) - COUNT(col2) col2_nulls
,..
, COUNT(*) - COUNT(colN) colN_nulls
, COUNT(*) total_rows
FROM T
) SELECT col1_nulls + col1_nulls +..+ colN_null
FROM d
/
Summarize all NULLs of a table (using Oracle dictionary tables)
Following is an improvement in which you need to now nothing but table name and it is very easy to code a function based on it
DECLARE
T VARCHAR2(64) := '<YOUR TABLE NAME>';
expr VARCHAR2(32767);
q INTEGER;
BEGIN
SELECT 'SELECT /*+FULL(T) PARALLEL(T)*/' || COUNT(*) || ' * COUNT(*) OVER () - ' || LISTAGG('COUNT(' || COLUMN_NAME || ')', ' + ') WITHIN GROUP (ORDER BY COLUMN_ID) || ' FROM ' || T
INTO expr
FROM USER_TAB_COLUMNS
WHERE TABLE_NAME = T;
-- This line is for debugging purposes only
DBMS_OUTPUT.PUT_LINE(expr);
EXECUTE IMMEDIATE expr INTO q;
DBMS_OUTPUT.PUT_LINE(q);
END;
/
Due to calculation implies a full table scan, code produced in expr variable was optimized for parallel running.
User defined function null_fields
Function version, also includes an optional parameter to be able to run on other schemas.
CREATE OR REPLACE FUNCTION null_fields(table_name IN VARCHAR2, owner IN VARCHAR2 DEFAULT USER)
RETURN INTEGER IS
T VARCHAR2(64) := UPPER(table_name);
o VARCHAR2(64) := UPPER(owner);
expr VARCHAR2(32767);
q INTEGER;
BEGIN
SELECT 'SELECT /*+FULL(T) PARALLEL(T)*/' || COUNT(*) || ' * COUNT(*) OVER () - ' || listagg('COUNT(' || column_name || ')', ' + ') WITHIN GROUP (ORDER BY column_id) || ' FROM ' || o || '.' || T || ' t'
INTO expr
FROM all_tab_columns
WHERE table_name = T;
EXECUTE IMMEDIATE expr INTO q;
RETURN q;
END;
/
-- Usage 1
SELECT null_fields('<your table name>') FROM dual
/
-- Usage 2
SELECT null_fields('<your table name>', '<table owner>') FROM dual
/
Thank you #Lord Peter :
The below PL/SQL script works
declare
cnt number :=0;
temp number :=0;
begin
for r in ( select column_name, nullable
from user_tab_columns
where table_name = upper('test')
order by column_id )
loop
if r.nullable = 'Y' then
EXECUTE IMMEDIATE 'SELECT count(*) FROM test where '|| r.column_name ||' IS NULL' into temp ;
cnt := cnt + temp;
END IF;
end loop;
dbms_output.put_line('Total: '||cnt);
end;
/
The table name test may be replaced the name of table of your interest.
I hope this solution is useful!
The dynamic SQL you execute (this is the string used in EXECUTE IMMEDIATE) should be
select sum(
decode(a,null,1,0)
+decode(b,null,1,0)
+decode(c,null,1,0)
) nullcols
from test;
Where each summand corresponds to a NOT NULL column.
Here only one table scan is necessary to get the result.
Use the data dictionary to find the number of NULL values almost instantly:
select sum(num_nulls) sum_num_nulls
from all_tab_columns
where owner = user
and table_name = 'TEST';
SUM_NUM_NULLS
-------------
6
The values will only be correct if optimizer statistics were gathered recently and if they were gathered with the default value for the sample size.
Those may seem like large caveats but it's worth becoming familiar with your database's statistics gathering process anyway. If your database is not automatically gathering statistics or if your database is not using the default sample size those are likely huge problems you need to be aware of.
To manually gather stats for a specific table a statement like this will work:
begin
dbms_stats.gather_table_stats(user, 'TEST');
end;
/
select COUNT(1) TOTAL from table where COLUMN is NULL;
In PostgreSQL 9.1, PL/pgSQL, given a query:
select fk_list.relname from ...
where relname is of type name (e.g., "table_name").
How do you get the appropriate value for "relname" that can be used directly in an UPDATE statement as:
Update <relname> set ...
within the PL/pgSQL script?
Using quote_ident(r.relname) as:
Update quote_ident(r.relname) Set ...
fails with:
syntax error at or near "("
LINE 55: UPDATE quote_ident(r.relname) ....
The complete code I am working with:
CREATE FUNCTION merge_children_of_icd9 (ocicd9 text,
ocdesc text, ncicd9 text, ncdesc text)
RETURNS void AS $BODY$
DECLARE
r RECORD;
BEGIN
FOR r IN
WITH fk_actions ( code, action ) AS (
VALUES ('a', 'error'),
('r', 'restrict'),
('c', 'cascade'),
('n', 'set null'),
('d', 'set default')
), fk_list AS (
SELECT pg_constraint.oid AS fkoid, conrelid, confrelid::regclass AS parentid,
conname, relname, nspname,
fk_actions_update.action AS update_action,
fk_actions_delete.action AS delete_action,
conkey AS key_cols
FROM pg_constraint
JOIN pg_class ON conrelid = pg_class.oid
JOIN pg_namespace ON pg_class.relnamespace = pg_namespace.oid
JOIN fk_actions AS fk_actions_update ON confupdtype = fk_actions_update.code
JOIN fk_actions AS fk_actions_delete ON confdeltype = fk_actions_delete.code
WHERE contype = 'f'
), fk_attributes AS (
SELECT fkoid, conrelid, attname, attnum
FROM fk_list
JOIN pg_attribute ON conrelid = attrelid AND attnum = ANY(key_cols)
ORDER BY fkoid, attnum
), fk_cols_list AS (
SELECT fkoid, array_agg(attname) AS cols_list
FROM fk_attributes
GROUP BY fkoid
)
SELECT fk_list.fkoid, fk_list.conrelid, fk_list.parentid, fk_list.conname,
fk_list.relname, fk_cols_list.cols_list
FROM fk_list
JOIN fk_cols_list USING (fkoid)
WHERE parentid = 'icd9'::regclass
LOOP
RAISE NOTICE 'now in loop. relname is %', quote_ident(r.relname);
RAISE NOTICE 'cols_list[1] is %', quote_ident(r.cols_list[1]);
RAISE NOTICE 'cols_list[2] is %', quote_ident(r.cols_list[2]);
RAISE NOTICE 'now doing update';
UPDATE quote_ident(r.relname) SET r.cols_list[1] = ncicd9, r.cols_list[2] = ncdesc
WHERE r.cols_list[1] = ocicd9 AND r.cols_list[2] = ocdesc;
RAISE NOTICE 'finished update';
END LOOP;
RETURN;
END $BODY$ LANGUAGE plpgsql VOLATILE;
-- select merge_children_of_icd9('', 'aodm type 2', '', 'aodm, type 2');
I'm sure this kind of thing is done often, but I can't seem to find anything like it using PostgreSQL. Is there a better way?
In an UPDATE statement in PL/pgSQL, the table name has to be given as a literal. If you want to dynamically set the table name and the columns, you should use the EXECUTE command and paste the query string together:
EXECUTE 'UPDATE ' || quote_ident(r.relname) ||
' SET ' || quote_ident(r.cols_list[1]) || ' = $1, ' ||
quote_ident(r.cols_list[2]) || ' = $2' ||
' WHERE ' || quote_ident(r.cols_list[1]) || ' = $3 AND ' ||
quote_ident(r.cols_list[2]) || ' = $4'
USING ncicd9, ncdesc, ocicd9, ocdesc;
The USING clause can only be used for substituting data values, as shown above.
You need dynamic SQL with EXECUTE like #Patrick already provided.
However, both your function and the EXECUTE part can be much simpler. In particular, use format() to concatenate longer query strings safely (available since pg 9.1):
CREATE OR REPLACE FUNCTION merge_children_of_icd9 (_ocicd9 text, _ocdesc text
, _ncicd9 text, _ncdesc text)
RETURNS void
LANGUAGE plpgsql AS
$func$
DECLARE
_sql text;
BEGIN
FOR _sql IN
SELECT format('UPDATE %3$s SET %1$I = $3 , %2$I = $4
WHERE %1$I = $1 AND %2$I = $2'
, x.cols[1], x.cols[2], x.conrelid::regclass::text)
FROM (
SELECT c.conrelid, array_agg(a.attname ORDER BY a.attnum) AS cols
FROM pg_constraint c
JOIN pg_attribute a ON a.attrelid = c.conrelid
AND a.attnum = ANY(c.conkey)
WHERE c.confrelid = 'icd9'::regclass
AND c.contype = 'f'
GROUP BY c.oid, c.conrelid
ORDER BY c.oid
) x
LOOP
-- RAISE NOTICE '%', _sql; -- debug?
EXECUTE _sql
USING _ocicd9, _ocdesc, _ncicd9, _ncdesc;
END LOOP;
END
$func$;
The function errors out if the FK constraint does not span at least two columns or if the data type of the columns is not compatible with text. May or may not be as intended.
Details on how to pass identifiers safely and execute dynamic SQL:
Table name as a PostgreSQL function parameter
Define table and column names as arguments in a plpgsql function?
Just a silly example:
Table A:
eggs
bread
cheese
Table B (when they are eaten):
Egg | date
Bread | date
Egg | date
cheese | date
Bread | date
For statistics purpouses, i need to have statistics per date per food type in a look like this:
Table Statistics:
egg | bread | cheese
date1 2 1 0
date2 6 4 2
date3 2 0 0
I need the column headers to be dynamic in the report (if new ones are added, it should automatically appear).
Any idea how to make this in postgres?
Thanks.
based on answer Postgres dynamic column headers (from another table) (the work of Eric Vallabh Minikel) i improved the function to be more flexible and convenient. I think it might be useful for others too, especially as it only relies on pg/plsql and does not need installation of extentions as other derivations of erics work (i.e. plpython) do. Testet with 9.3.5 but should also work at least down to 9.2.
Improvements:
deal with pivoted column names containing spaces
deal with multiple row header columns
deal with aggregate function in pivot cell as well as non-aggregated pivot cell (last parameter might be 'sum(cellval)' as well as 'cellval' in case the underlying table/view already does aggregation)
auto detect data type of pivot cell (no need to pass it to the function any more)
Useage:
SELECT get_crosstab_statement('table_to_pivot', ARRAY['rowname' [, <other_row_header_columns_as_well>], 'colname', 'max(cellval)');
Code:
CREATE OR REPLACE FUNCTION get_crosstab_statement(tablename character varying, row_header_columns character varying[], pivot_headers_column character varying, pivot_values character varying)
RETURNS character varying AS
$BODY$
--returns the sql statement to use for pivoting the table
--based on: http://www.cureffi.org/2013/03/19/automatically-creating-pivot-table-column-names-in-postgresql/
--based on: https://stackoverflow.com/questions/4104508/postgres-dynamic-column-headers-from-another-table
--based on: http://www.postgresonline.com/journal/categories/24-tablefunc
DECLARE
arrayname CONSTANT character varying := 'r';
row_headers_simple character varying;
row_headers_quoted character varying;
row_headers_castdown character varying;
row_headers_castup character varying;
row_header_count smallint;
row_header record;
pivot_values_columnname character varying;
pivot_values_datatype character varying;
pivot_headers_definition character varying;
pivot_headers_simple character varying;
sql_row_headers character varying;
sql_pivot_headers character varying;
sql_crosstab_result character varying;
BEGIN
-- 1. create row header definitions
row_headers_simple := array_to_string(row_header_columns, ', ');
row_headers_quoted := '''' || array_to_string(row_header_columns, ''', ''') || '''';
row_headers_castdown := array_to_string(row_header_columns, '::text, ') || '::text';
row_header_count := 0;
sql_row_headers := 'SELECT column_name, data_type
FROM information_schema.columns
WHERE table_name = ''' || tablename || ''' AND column_name IN (' || row_headers_quoted || ')';
FOR row_header IN EXECUTE sql_row_headers LOOP
row_header_count := row_header_count + 1;
row_headers_castup := COALESCE(row_headers_castup || ', ', '') || arrayname || '[' || row_header_count || ']::' || row_header.data_type || ' AS ' || row_header.column_name;
END LOOP;
-- 2. retrieve basic column name in case an aggregate function is used
SELECT coalesce(substring(pivot_values FROM '.*\((.*)\)'), pivot_values)
INTO pivot_values_columnname;
-- 3. retrieve pivot values datatype
SELECT data_type
FROM information_schema.columns
WHERE table_name = tablename AND column_name = pivot_values_columnname
INTO pivot_values_datatype;
-- 4. retrieve list of pivot column names.
sql_pivot_headers := 'SELECT string_agg(DISTINCT quote_ident(' || pivot_headers_column || '), '', '' ORDER BY quote_ident(' || pivot_headers_column || ')) as names, string_agg(DISTINCT quote_ident(' || pivot_headers_column || ') || '' ' || pivot_values_datatype || ''', '', '' ORDER BY quote_ident(' || pivot_headers_column || ') || '' ' || pivot_values_datatype || ''') as definitions FROM ' || tablename || ';';
EXECUTE sql_pivot_headers INTO pivot_headers_simple, pivot_headers_definition;
-- 5. set up the crosstab query
sql_crosstab_result := 'SELECT ' || replace (row_headers_castup || ', ' || pivot_headers_simple, ', ', ',
') || '
FROM crosstab (
''SELECT ARRAY[' || row_headers_castdown || '] AS ' || arrayname || ', ' || pivot_headers_column || ', ' || pivot_values || '
FROM ' || tablename || '
GROUP BY ' || row_headers_simple || ', ' || pivot_headers_column || (CASE pivot_values_columnname=pivot_values WHEN true THEN ', ' || pivot_values ELSE '' END) || '
ORDER BY ' || row_headers_simple || '''
,
''SELECT DISTINCT ' || pivot_headers_column || '
FROM ' || tablename || '
ORDER BY ' || pivot_headers_column || '''
) AS newtable (
' || arrayname || ' varchar[]' || ',
' || replace(pivot_headers_definition, ', ', ',
') || '
);';
RETURN sql_crosstab_result;
END
$BODY$
LANGUAGE plpgsql STABLE
COST 100;
I came across the same problem, and found an alternative solution. I'd like to ask for comments here.
The idea is to create the "output type" string for crosstab dynamically. The end result can't be returned by a plpgsql function, because that function would either need to have a static return type (which we don't have) or return setof record, thus having no advantage over the original crosstab function. Therefore my function saves the output cross-table in a view. Likewise, the input table of "pivot" data not yet in cross-table format is taken from an existing view or table.
The usage would be like this, using your example (I added the "fat" field to illustrate the sorting feature):
CREATE TABLE food_fat (name character varying(20) NOT NULL, fat integer);
CREATE TABLE eaten (food character varying(20) NOT NULL, day date NOT NULL);
-- This view will be formatted as cross-table.
-- ORDER BY is important, otherwise crosstab won't merge rows
CREATE TEMPORARY VIEW mymeals AS
SELECT day,food,COUNT(*) AS meals FROM eaten
GROUP BY day, food ORDER BY day;
SELECT auto_crosstab_ordered('mymeals_cross',
'mymeals', 'day', 'food', 'meals', -- what table to convert to cross-table
'food_fat', 'name', 'fat'); -- where to take the columns from
The last statement creates a view mymeals_cross that looks like this:
SELECT * FROM mymeals_cross;
day | bread | cheese | eggs
------------+-------+--------+------
2012-06-01 | 3 | 3 | 2
2012-06-02 | 2 | 1 | 3
(2 rows)
Here comes my implementation:
-- FUNCTION get_col_type(tab, col)
-- returns the data type of column <col> in table <tab> as string
DROP FUNCTION get_col_type(TEXT, TEXT);
CREATE FUNCTION get_col_type(tab TEXT, col TEXT, OUT ret TEXT)
AS $BODY$ BEGIN
EXECUTE $f$
SELECT atttypid::regtype::text
FROM pg_catalog.pg_attribute
WHERE attrelid='$f$||quote_ident(tab)||$f$'::regclass
AND attname='$f$||quote_ident(col)||$f$'
$f$ INTO ret;
END;
$BODY$ LANGUAGE plpgsql;
-- FUNCTION get_crosstab_type(tab, row, val, cattab, catcol, catord)
--
-- This function generates the output type expression for the crosstab(text, text)
-- function from the PostgreSQL tablefunc module when the "categories"
-- (cross table column labels) can be looked up in some view or table.
--
-- See auto_crosstab below for parameters
DROP FUNCTION get_crosstab_type(TEXT, TEXT, TEXT, TEXT, TEXT, TEXT);
CREATE FUNCTION get_crosstab_type(tab TEXT, rw TEXT, val TEXT, cattab TEXT,
catcol TEXT, catord TEXT, OUT ret TEXT)
AS $BODY$ BEGIN
EXECUTE $f$
SELECT '"$f$||quote_ident(rw)||$f$" $f$
||get_col_type(quote_ident(tab), quote_ident(rw))||$f$'
|| string_agg(',"'||_values._v||'" $f$
||get_col_type(quote_ident(tab), quote_ident(val))||$f$')
FROM (SELECT DISTINCT ON(_t.$f$||quote_ident(catord)||$f$) _t.$f$||quote_ident(catcol)||$f$ AS _v
FROM $f$||quote_ident(cattab)||$f$ _t
ORDER BY _t.$f$||quote_ident(catord)||$f$) _values
$f$ INTO ret;
END; $BODY$ LANGUAGE plpgsql;
-- FUNCTION auto_crosstab_ordered(view_name, tab, row, cat, val, cattab, catcol, catord)
--
-- This function creates a VIEW containing a cross-table of input table.
-- It fetches the column names of the cross table ("categories") from
-- another table.
--
-- view_name - name of VIEW to be created
-- tab - input table. This table / view must have 3 columns:
-- "row", "category", "value".
-- row - column name of the "row" column
-- cat - column name of the "category" column
-- val - column name of the "value" column
-- cattab - another table holding the possible categories
-- catcol - column name in cattab to use as column label in the cross table
-- catord - column name in cattab to sort columns in the cross table
DROP FUNCTION auto_crosstab_ordered(TEXT, TEXT, TEXT, TEXT, TEXT, TEXT, TEXT, TEXT);
CREATE FUNCTION auto_crosstab_ordered(view_name TEXT, tab TEXT, rw TEXT,
cat TEXT, val TEXT, cattab TEXT, catcol TEXT, catord TEXT) RETURNS void
AS $BODY$ BEGIN
EXECUTE $f$
CREATE VIEW $f$||quote_ident(view_name)||$f$ AS
SELECT * FROM crosstab(
'SELECT $f$||quote_ident(rw)||$f$,
$f$||quote_ident(cat)||$f$,
$f$||quote_ident(val)||$f$
FROM $f$||quote_ident(tab)||$f$',
'SELECT DISTINCT ON($f$||quote_ident(catord)||$f$) $f$||quote_ident(catcol)||$f$
FROM $f$||quote_ident(cattab)||$f$ m
ORDER BY $f$||quote_ident(catord)||$f$'
) AS ($f$||get_crosstab_type(tab, rw, val, cattab, catcol, catord)||$f$)$f$;
END; $BODY$ LANGUAGE plpgsql;
-- FUNCTION auto_crosstab(view_name, tab, row, cat, val)
--
-- This function creates a VIEW containing a cross-table of input table.
-- It fetches the column names of the cross table ("categories") from
-- DISTINCT values of the 2nd column of the input table.
--
-- view_name - name of VIEW to be created
-- tab - input table. This table / view must have 3 columns:
-- "row", "category", "value".
-- row - column name of the "row" column
-- cat - column name of the "category" column
-- val - column name of the "value" column
DROP FUNCTION auto_crosstab(TEXT, TEXT, TEXT, TEXT, TEXT);
CREATE FUNCTION auto_crosstab(view_name TEXT, tab TEXT, rw TEXT, cat TEXT, val TEXT) RETURNS void
AS $$ BEGIN
PERFORM auto_crosstab_ordered(view_name, tab, rw, cat, val, tab, cat, cat);
END; $$ LANGUAGE plpgsql;
The code is working fine. you will receive output as dynamic query.
CREATE OR REPLACE FUNCTION public.pivotcode(tablename character varying, rowc character varying, colc character varying, cellc character varying, celldatatype character varying)
RETURNS character varying AS
$BODY$
declare
dynsql1 varchar;
dynsql2 varchar;
columnlist varchar;
begin
-- 1. retrieve list of column names.
dynsql1 = 'select string_agg(distinct ''_''||'||colc||'||'' '||celldatatype||''','','' order by ''_''||'||colc||'||'' '||celldatatype||''') from '||tablename||';';
execute dynsql1 into columnlist;
-- 2. set up the crosstab query
--create temp table temp as
dynsql2 = 'select * from crosstab ( ''select '||rowc||','||colc||','||cellc||' from '||tablename||' group by 1,2 order by 1,2'', ''select distinct '||colc||' from '||tablename||' order by 1'' ) as newtable ( '||rowc||' varchar,'||columnlist||' );';
return dynsql2;
end
$BODY$
LANGUAGE plpgsql VOLATILE
COST 100;
This is an example of pivoting data - in postgreSQL 9 there is a crosstab function to do this: http://www.postgresql.org/docs/current/static/tablefunc.html
I solved it folowing this article:
http://okbob.blogspot.com/2008/08/using-cursors-for-generating-cross.html
In short, i used function that for every result in a query dynamically creates the values for the next query, and returns the result needed as a refcursor. This solved my sql result part, now i need to figure out the java part, but that isnt connected much to the question :)
I'm new to Postgres and SQL. I created the following script that draws a line from a point to a projected point on the nearest line. It works fine on a small data set 5 to 10 points with the same number of lines; however, doing it on 60 points with 2,000 lines, the query takes about 12 hours. it is based on a nearest neighbour function pasted below as well from http://www.bostongis.com/downloads/pgis_nn.txt
EDIT documentation on pgis_fn_nn is available on http://www.bostongis.com/PrinterFriendly.aspx?content_name=postgis_nearest_neighbor_generic
The slow part is the implementation of pgis_fn_nn(...)
What am I doing wrong?
Are there any tips to make this faster ?
Is there a way I can improve on both scripts?
What would you recommend if I want to combine both queries into one?
my_script.sql
-- this sql script creates a line table that connects points from a point table
-- to the projected points from the nearest line to the point of oritin
-- delete duplicate tables if they exist
DROP TABLE exploded_roads;
DROP TABLE projected_points;
DROP TABLE lines_from_centroids_to_roads;
-- create temporary exploaded lines table
CREATE TABLE exploded_roads (
the_geom geometry,
edge_id serial
);
-- insert the linestring that are not multistring
INSERT INTO exploded_roads
SELECT the_geom
FROM "StreetCenterLines"
WHERE st_geometrytype(the_geom) = 'ST_LineString';
-- insert the linestrings that need to be converted from multi string
INSERT INTO exploded_roads
SELECT the_geom
FROM (
SELECT ST_GeometryN(
the_geom,
generate_series(1, ST_NumGeometries(the_geom)))
AS the_geom
FROM "StreetCenterLines"
)
AS foo;
-- create projected points table with ids matching centroid table
CREATE TABLE projected_points (
the_geom geometry,
pid serial,
dauid int
);
-- Populate Table
-- code based on Paul Ramsey's site and Boston GIS' NN code
INSERT INTO projected_points(the_geom, dauid)
SELECT DISTINCT ON ("DAUID"::int)
(
ST_Line_Interpolate_Point(
(
SELECT the_geom
FROM exploded_roads
WHERE edge_id IN
(
SELECT nn_gid
FROM pgis_fn_nn(centroids.the_geom, 30000000, 1,10, 'exploded_roads', 'true', 'edge_id', 'the_geom')
)
),
ST_Line_Locate_Point(
exploded_roads.the_geom,
centroids.the_geom
)
)
),
(centroids."DAUID"::int)
FROM exploded_roads, fred_city_o6_da_centroids centroids;
-- Create Line tables
CREATE TABLE lines_from_centroids_to_roads (
the_geom geometry,
edge_id SERIAL
);
-- Populate Line Table
INSERT INTO lines_from_centroids_to_roads(
SELECT
ST_MakeLine( centroids.the_geom, projected_points.the_geom )
FROM projected_points, fred_city_o6_da_centroids centroids
WHERE projected_points.dauid = centroids.id
);
pgis_fn_nn from http://www.bostongis.com/downloads/pgis_nn.txt
---LAST UPDATED 8/2/2007 --
CREATE OR REPLACE FUNCTION expandoverlap_metric(a geometry, b geometry, maxe double precision, maxslice double precision)
RETURNS integer AS
$BODY$
BEGIN
FOR i IN 0..maxslice LOOP
IF expand(a,maxe*i/maxslice) && b THEN
RETURN i;
END IF;
END LOOP;
RETURN 99999999;
END;
$BODY$
LANGUAGE 'plpgsql' IMMUTABLE;
CREATE TYPE pgis_nn AS
(nn_gid integer, nn_dist numeric(16,5));
CREATE OR REPLACE FUNCTION _pgis_fn_nn(geom1 geometry, distguess double precision, numnn integer, maxslices integer, lookupset varchar(150), swhere varchar(5000), sgid2field varchar(100), sgeom2field varchar(100))
RETURNS SETOF pgis_nn AS
$BODY$
DECLARE
strsql text;
rec pgis_nn;
ncollected integer;
it integer;
--NOTE: it: the iteration we are currently at
--start at the bounding box of the object (expand 0) and move up until it has collected more objects than we need or it = maxslices whichever event happens first
BEGIN
ncollected := 0; it := 0;
WHILE ncollected < numnn AND it <= maxslices LOOP
strsql := 'SELECT currentit.' || sgid2field || ', distance(ref.geom, currentit.' || sgeom2field || ') as dist FROM ' || lookupset || ' as currentit, (SELECT geometry(''' || CAST(geom1 As text) || ''') As geom) As ref WHERE ' || swhere || ' AND distance(ref.geom, currentit.' || sgeom2field || ') <= ' || CAST(distguess As varchar(200)) || ' AND expand(ref.geom, ' || CAST(distguess*it/maxslices As varchar(100)) || ') && currentit.' || sgeom2field || ' AND expandoverlap_metric(ref.geom, currentit.' || sgeom2field || ', ' || CAST(distguess As varchar(200)) || ', ' || CAST(maxslices As varchar(200)) || ') = ' || CAST(it As varchar(100)) || ' ORDER BY distance(ref.geom, currentit.' || sgeom2field || ') LIMIT ' ||
CAST((numnn - ncollected) As varchar(200));
--RAISE NOTICE 'sql: %', strsql;
FOR rec in EXECUTE (strsql) LOOP
IF ncollected < numnn THEN
ncollected := ncollected + 1;
RETURN NEXT rec;
ELSE
EXIT;
END IF;
END LOOP;
it := it + 1;
END LOOP;
END
$BODY$
LANGUAGE 'plpgsql' STABLE;
CREATE OR REPLACE FUNCTION pgis_fn_nn(geom1 geometry, distguess double precision, numnn integer, maxslices integer, lookupset varchar(150), swhere varchar(5000), sgid2field varchar(100), sgeom2field varchar(100))
RETURNS SETOF pgis_nn AS
$BODY$
SELECT * FROM _pgis_fn_nn($1,$2, $3, $4, $5, $6, $7, $8);
$BODY$
LANGUAGE 'sql' STABLE;
I am using "the nearest" function to do pgRouting on OpenStreetMap data. At first I stumbled upon the fn_nn function you mention too, but a visit to the #postgis irc channel on irc.freenode.net helped me out. It turns out postgis has some fantastic linear functions that, when combined, answer everything you need!
You can find more on the linear functions at: http://postgis.refractions.net/documentation/manual-1.3/ch06.html#id2578698 but here is how I implemented it
select
line_interpolate_point(ways.the_geom,
line_locate_point(ways.the_geom, pnt))),')','')) as anchor_point,
-- returns the anchor point
line_locate_point(ways.the_geom, pnt) as anchor_percentage,
-- returns the percentage on the line where the anchor will
-- touch (number between 0 and 1)
CASE
WHEN line_locate_point(ways.the_geom, pnt) < 0.5 THEN ways.source
WHEN line_locate_point(ways.the_geom, pnt) > 0.5 THEN ways.target
END as node,
-- returns the nearest end node id
length_spheroid( st_line_substring(ways.the_geom,0,
line_locate_point(ways.the_geom, pnt)),
'SPHEROID[\"WGS 84\",6378137,298.257223563]' ) as length,
distance_spheroid(pnt, line_interpolate_point(ways.the_geom,
line_locate_point(ways.the_geom, pnt)),
'SPHEROID[\"WGS 84\",6378137,298.257223563]') as dist
from ways, planet_osm_line,
ST_GeomFromText('POINT(1.245 51.234)', 4326) as pnt
where ways.gid = planet_osm_line.osm_id
order by dist asc limit 1;";
Hope this is of any use to you