Postgres dynamic column headers (from another table) - sql

Just a silly example:
Table A:
eggs
bread
cheese
Table B (when they are eaten):
Egg | date
Bread | date
Egg | date
cheese | date
Bread | date
For statistics purpouses, i need to have statistics per date per food type in a look like this:
Table Statistics:
egg | bread | cheese
date1 2 1 0
date2 6 4 2
date3 2 0 0
I need the column headers to be dynamic in the report (if new ones are added, it should automatically appear).
Any idea how to make this in postgres?
Thanks.

based on answer Postgres dynamic column headers (from another table) (the work of Eric Vallabh Minikel) i improved the function to be more flexible and convenient. I think it might be useful for others too, especially as it only relies on pg/plsql and does not need installation of extentions as other derivations of erics work (i.e. plpython) do. Testet with 9.3.5 but should also work at least down to 9.2.
Improvements:
deal with pivoted column names containing spaces
deal with multiple row header columns
deal with aggregate function in pivot cell as well as non-aggregated pivot cell (last parameter might be 'sum(cellval)' as well as 'cellval' in case the underlying table/view already does aggregation)
auto detect data type of pivot cell (no need to pass it to the function any more)
Useage:
SELECT get_crosstab_statement('table_to_pivot', ARRAY['rowname' [, <other_row_header_columns_as_well>], 'colname', 'max(cellval)');
Code:
CREATE OR REPLACE FUNCTION get_crosstab_statement(tablename character varying, row_header_columns character varying[], pivot_headers_column character varying, pivot_values character varying)
RETURNS character varying AS
$BODY$
--returns the sql statement to use for pivoting the table
--based on: http://www.cureffi.org/2013/03/19/automatically-creating-pivot-table-column-names-in-postgresql/
--based on: https://stackoverflow.com/questions/4104508/postgres-dynamic-column-headers-from-another-table
--based on: http://www.postgresonline.com/journal/categories/24-tablefunc
DECLARE
arrayname CONSTANT character varying := 'r';
row_headers_simple character varying;
row_headers_quoted character varying;
row_headers_castdown character varying;
row_headers_castup character varying;
row_header_count smallint;
row_header record;
pivot_values_columnname character varying;
pivot_values_datatype character varying;
pivot_headers_definition character varying;
pivot_headers_simple character varying;
sql_row_headers character varying;
sql_pivot_headers character varying;
sql_crosstab_result character varying;
BEGIN
-- 1. create row header definitions
row_headers_simple := array_to_string(row_header_columns, ', ');
row_headers_quoted := '''' || array_to_string(row_header_columns, ''', ''') || '''';
row_headers_castdown := array_to_string(row_header_columns, '::text, ') || '::text';
row_header_count := 0;
sql_row_headers := 'SELECT column_name, data_type
FROM information_schema.columns
WHERE table_name = ''' || tablename || ''' AND column_name IN (' || row_headers_quoted || ')';
FOR row_header IN EXECUTE sql_row_headers LOOP
row_header_count := row_header_count + 1;
row_headers_castup := COALESCE(row_headers_castup || ', ', '') || arrayname || '[' || row_header_count || ']::' || row_header.data_type || ' AS ' || row_header.column_name;
END LOOP;
-- 2. retrieve basic column name in case an aggregate function is used
SELECT coalesce(substring(pivot_values FROM '.*\((.*)\)'), pivot_values)
INTO pivot_values_columnname;
-- 3. retrieve pivot values datatype
SELECT data_type
FROM information_schema.columns
WHERE table_name = tablename AND column_name = pivot_values_columnname
INTO pivot_values_datatype;
-- 4. retrieve list of pivot column names.
sql_pivot_headers := 'SELECT string_agg(DISTINCT quote_ident(' || pivot_headers_column || '), '', '' ORDER BY quote_ident(' || pivot_headers_column || ')) as names, string_agg(DISTINCT quote_ident(' || pivot_headers_column || ') || '' ' || pivot_values_datatype || ''', '', '' ORDER BY quote_ident(' || pivot_headers_column || ') || '' ' || pivot_values_datatype || ''') as definitions FROM ' || tablename || ';';
EXECUTE sql_pivot_headers INTO pivot_headers_simple, pivot_headers_definition;
-- 5. set up the crosstab query
sql_crosstab_result := 'SELECT ' || replace (row_headers_castup || ', ' || pivot_headers_simple, ', ', ',
') || '
FROM crosstab (
''SELECT ARRAY[' || row_headers_castdown || '] AS ' || arrayname || ', ' || pivot_headers_column || ', ' || pivot_values || '
FROM ' || tablename || '
GROUP BY ' || row_headers_simple || ', ' || pivot_headers_column || (CASE pivot_values_columnname=pivot_values WHEN true THEN ', ' || pivot_values ELSE '' END) || '
ORDER BY ' || row_headers_simple || '''
,
''SELECT DISTINCT ' || pivot_headers_column || '
FROM ' || tablename || '
ORDER BY ' || pivot_headers_column || '''
) AS newtable (
' || arrayname || ' varchar[]' || ',
' || replace(pivot_headers_definition, ', ', ',
') || '
);';
RETURN sql_crosstab_result;
END
$BODY$
LANGUAGE plpgsql STABLE
COST 100;

I came across the same problem, and found an alternative solution. I'd like to ask for comments here.
The idea is to create the "output type" string for crosstab dynamically. The end result can't be returned by a plpgsql function, because that function would either need to have a static return type (which we don't have) or return setof record, thus having no advantage over the original crosstab function. Therefore my function saves the output cross-table in a view. Likewise, the input table of "pivot" data not yet in cross-table format is taken from an existing view or table.
The usage would be like this, using your example (I added the "fat" field to illustrate the sorting feature):
CREATE TABLE food_fat (name character varying(20) NOT NULL, fat integer);
CREATE TABLE eaten (food character varying(20) NOT NULL, day date NOT NULL);
-- This view will be formatted as cross-table.
-- ORDER BY is important, otherwise crosstab won't merge rows
CREATE TEMPORARY VIEW mymeals AS
SELECT day,food,COUNT(*) AS meals FROM eaten
GROUP BY day, food ORDER BY day;
SELECT auto_crosstab_ordered('mymeals_cross',
'mymeals', 'day', 'food', 'meals', -- what table to convert to cross-table
'food_fat', 'name', 'fat'); -- where to take the columns from
The last statement creates a view mymeals_cross that looks like this:
SELECT * FROM mymeals_cross;
day | bread | cheese | eggs
------------+-------+--------+------
2012-06-01 | 3 | 3 | 2
2012-06-02 | 2 | 1 | 3
(2 rows)
Here comes my implementation:
-- FUNCTION get_col_type(tab, col)
-- returns the data type of column <col> in table <tab> as string
DROP FUNCTION get_col_type(TEXT, TEXT);
CREATE FUNCTION get_col_type(tab TEXT, col TEXT, OUT ret TEXT)
AS $BODY$ BEGIN
EXECUTE $f$
SELECT atttypid::regtype::text
FROM pg_catalog.pg_attribute
WHERE attrelid='$f$||quote_ident(tab)||$f$'::regclass
AND attname='$f$||quote_ident(col)||$f$'
$f$ INTO ret;
END;
$BODY$ LANGUAGE plpgsql;
-- FUNCTION get_crosstab_type(tab, row, val, cattab, catcol, catord)
--
-- This function generates the output type expression for the crosstab(text, text)
-- function from the PostgreSQL tablefunc module when the "categories"
-- (cross table column labels) can be looked up in some view or table.
--
-- See auto_crosstab below for parameters
DROP FUNCTION get_crosstab_type(TEXT, TEXT, TEXT, TEXT, TEXT, TEXT);
CREATE FUNCTION get_crosstab_type(tab TEXT, rw TEXT, val TEXT, cattab TEXT,
catcol TEXT, catord TEXT, OUT ret TEXT)
AS $BODY$ BEGIN
EXECUTE $f$
SELECT '"$f$||quote_ident(rw)||$f$" $f$
||get_col_type(quote_ident(tab), quote_ident(rw))||$f$'
|| string_agg(',"'||_values._v||'" $f$
||get_col_type(quote_ident(tab), quote_ident(val))||$f$')
FROM (SELECT DISTINCT ON(_t.$f$||quote_ident(catord)||$f$) _t.$f$||quote_ident(catcol)||$f$ AS _v
FROM $f$||quote_ident(cattab)||$f$ _t
ORDER BY _t.$f$||quote_ident(catord)||$f$) _values
$f$ INTO ret;
END; $BODY$ LANGUAGE plpgsql;
-- FUNCTION auto_crosstab_ordered(view_name, tab, row, cat, val, cattab, catcol, catord)
--
-- This function creates a VIEW containing a cross-table of input table.
-- It fetches the column names of the cross table ("categories") from
-- another table.
--
-- view_name - name of VIEW to be created
-- tab - input table. This table / view must have 3 columns:
-- "row", "category", "value".
-- row - column name of the "row" column
-- cat - column name of the "category" column
-- val - column name of the "value" column
-- cattab - another table holding the possible categories
-- catcol - column name in cattab to use as column label in the cross table
-- catord - column name in cattab to sort columns in the cross table
DROP FUNCTION auto_crosstab_ordered(TEXT, TEXT, TEXT, TEXT, TEXT, TEXT, TEXT, TEXT);
CREATE FUNCTION auto_crosstab_ordered(view_name TEXT, tab TEXT, rw TEXT,
cat TEXT, val TEXT, cattab TEXT, catcol TEXT, catord TEXT) RETURNS void
AS $BODY$ BEGIN
EXECUTE $f$
CREATE VIEW $f$||quote_ident(view_name)||$f$ AS
SELECT * FROM crosstab(
'SELECT $f$||quote_ident(rw)||$f$,
$f$||quote_ident(cat)||$f$,
$f$||quote_ident(val)||$f$
FROM $f$||quote_ident(tab)||$f$',
'SELECT DISTINCT ON($f$||quote_ident(catord)||$f$) $f$||quote_ident(catcol)||$f$
FROM $f$||quote_ident(cattab)||$f$ m
ORDER BY $f$||quote_ident(catord)||$f$'
) AS ($f$||get_crosstab_type(tab, rw, val, cattab, catcol, catord)||$f$)$f$;
END; $BODY$ LANGUAGE plpgsql;
-- FUNCTION auto_crosstab(view_name, tab, row, cat, val)
--
-- This function creates a VIEW containing a cross-table of input table.
-- It fetches the column names of the cross table ("categories") from
-- DISTINCT values of the 2nd column of the input table.
--
-- view_name - name of VIEW to be created
-- tab - input table. This table / view must have 3 columns:
-- "row", "category", "value".
-- row - column name of the "row" column
-- cat - column name of the "category" column
-- val - column name of the "value" column
DROP FUNCTION auto_crosstab(TEXT, TEXT, TEXT, TEXT, TEXT);
CREATE FUNCTION auto_crosstab(view_name TEXT, tab TEXT, rw TEXT, cat TEXT, val TEXT) RETURNS void
AS $$ BEGIN
PERFORM auto_crosstab_ordered(view_name, tab, rw, cat, val, tab, cat, cat);
END; $$ LANGUAGE plpgsql;

The code is working fine. you will receive output as dynamic query.
CREATE OR REPLACE FUNCTION public.pivotcode(tablename character varying, rowc character varying, colc character varying, cellc character varying, celldatatype character varying)
RETURNS character varying AS
$BODY$
declare
dynsql1 varchar;
dynsql2 varchar;
columnlist varchar;
begin
-- 1. retrieve list of column names.
dynsql1 = 'select string_agg(distinct ''_''||'||colc||'||'' '||celldatatype||''','','' order by ''_''||'||colc||'||'' '||celldatatype||''') from '||tablename||';';
execute dynsql1 into columnlist;
-- 2. set up the crosstab query
--create temp table temp as
dynsql2 = 'select * from crosstab ( ''select '||rowc||','||colc||','||cellc||' from '||tablename||' group by 1,2 order by 1,2'', ''select distinct '||colc||' from '||tablename||' order by 1'' ) as newtable ( '||rowc||' varchar,'||columnlist||' );';
return dynsql2;
end
$BODY$
LANGUAGE plpgsql VOLATILE
COST 100;

This is an example of pivoting data - in postgreSQL 9 there is a crosstab function to do this: http://www.postgresql.org/docs/current/static/tablefunc.html

I solved it folowing this article:
http://okbob.blogspot.com/2008/08/using-cursors-for-generating-cross.html
In short, i used function that for every result in a query dynamically creates the values for the next query, and returns the result needed as a refcursor. This solved my sql result part, now i need to figure out the java part, but that isnt connected much to the question :)

Related

Apply function to all columns in a Postgres table dynamically

Using Postgres 13.1, I want to apply a forward fill function to all columns of a table. The forward fill function is explained in my earlier question:
How to do forward fill as a PL/PGSQL function
However, in that case the columns and table are specified. I want to take that code and apply it to an arbitrary table, ie. specify a table and the forward fill is applied to each of the columns.
Using this table as an example:
CREATE TABLE example(row_num int, id int, str text, val integer);
INSERT INTO example VALUES
(1, 1, '1a', NULL)
, (2, 1, NULL, 1)
, (3, 2, '2a', 2)
, (4, 2, NULL, NULL)
, (5, 3, NULL, NULL)
, (6, 3, '3a', 31)
, (7, 3, NULL, NULL)
, (8, 3, NULL, 32)
, (9, 3, '3b', NULL)
, (10,3, NULL, NULL)
;
I start with the following working base for the function. I call it passing in some variable names. Note the first is a table name not a column name. The function takes the table name and creates an array of all the column names and then outputs the names.
create or replace function col_collect(tbl text, id text, row_num text)
returns text[]
language plpgsql as
$func$
declare
tmp text[];
col text;
begin
select array (
select column_name
from information_schema."columns" c
where table_name = tbl
) into tmp;
foreach col in array tmp
loop
raise notice 'col: %', col;
end loop;
return tmp;
end
$func$;
I want to apply the "forward fill" function I got from my earlier question to each column of a table. UPDATE seems to be the correct approach. So this is the preceding function where I replace raise notice by an update using execute so I can pass in the table name:
create or replace function col_collect(tbl text, id text, row_num text)
returns void
language plpgsql as
$func$
declare
tmp text[];
col text;
begin
select array (
select column_name
from information_schema."columns" c
where table_name = tbl
) into tmp;
foreach col in array tmp
loop
execute 'update '||tbl||'
set '||col||' = gapfill('||col||') OVER w AS '||col||'
where '||tbl||'.row_num = '||col||'.row_num
window w as (PARTITION BY '||id||' ORDER BY '||row_num||')
returning *;';
end loop;
end
$func$;
-- call the function
select col_collect('example','id','row_num')
The preceding errors out with a syntax error. I have tried many variations on this but they all fail. Helpful answers on SO were here and here. The aggregate function I'm trying to apply (as window function) is:
CREATE OR REPLACE FUNCTION gap_fill_internal(s anyelement, v anyelement)
RETURNS anyelement
LANGUAGE plpgsql AS
$func$
BEGIN
RETURN COALESCE(v, s); -- that's all!
END
$func$;
CREATE AGGREGATE gap_fill(anyelement) (
SFUNC = gap_fill_internal,
STYPE = anyelement
);
My questions are:
is this a good approach and if so what am I doing wrong; or
is there a better way to do this?
What you ask is not a trivial task. You should be comfortable with PL/pgSQL. I do not advise this kind of dynamic SQL queries for beginners, too powerful.
That said, let's dive in. Buckle up!
CREATE OR REPLACE FUNCTION f_gap_fill_update(_tbl regclass, _id text, _row_num text, OUT nullable_columns int, OUT updated_rows int)
LANGUAGE plpgsql AS
$func$
DECLARE
_pk text := quote_ident(_row_num);
_sql text;
BEGIN
SELECT INTO _sql, nullable_columns
concat_ws(E'\n'
, 'UPDATE ' || _tbl || ' t'
, 'SET (' || string_agg( quote_ident(a.attname), ', ') || ')'
, ' = (' || string_agg('u.' || quote_ident(a.attname), ', ') || ')'
, 'FROM ('
, ' SELECT ' || _pk
, ' , ' || string_agg(format('gap_fill(%1$I) OVER w AS %1$I', a.attname), ', ')
, ' FROM ' || _tbl
, format(' WINDOW w AS (PARTITION BY %I ORDER BY %s)', _id, _pk)
, ' ) u'
, format('WHERE t.%1$s = u.%1$s', _pk)
, 'AND (' || string_agg('t.' || quote_ident(a.attname), ', ') || ') IS DISTINCT FROM'
, ' (' || string_agg('u.' || quote_ident(a.attname), ', ') || ')'
)
, count(*) -- AS _col_ct
FROM (
SELECT a.attname
FROM pg_attribute a
WHERE a.attrelid = _tbl
AND a.attnum > 0
AND NOT a.attisdropped
AND NOT a.attnotnull
ORDER BY a.attnum
) a;
IF nullable_columns = 0 THEN
RAISE EXCEPTION 'No nullable columns found in table >>%<<', _tbl;
ELSIF _sql IS NULL THEN
RAISE EXCEPTION 'SQL string is NULL. Should not occur!';
END IF;
-- RAISE NOTICE '%', _sql; -- debug
EXECUTE _sql; -- execute
GET DIAGNOSTICS updated_rows = ROW_COUNT;
END
$func$;
Example call:
SELECT * FROM f_gap_fill_update('example', 'id', 'row_num');
db<>fiddle here
The function is state of the art.
Generates and executes a query of the form:
UPDATE tbl t
SET (str, val, col1)
= (u.str, u.val, u.col1)
FROM (
SELECT row_num
, gap_fill(str) OVER w AS str, gap_fill(val) OVER w AS val
, gap_fill(col1) OVER w AS col1
FROM tbl
WINDOW w AS (PARTITION BY id ORDER BY row_num)
) u
WHERE t.row_num = u.row_num
AND (t.str, t.val, t.col1) IS DISTINCT FROM
(u.str, u.val, u.col1)
Using pg_catalog.pg_attribute instead of the information schema. See:
"Information schema vs. system catalogs"
Note the final WHERE clause to prevent (possibly expensive) empty updates. Only rows that actually change will be written. See:
How do I (or can I) SELECT DISTINCT on multiple columns?
Moreover, only nullable columns (not defined NOT NULL) will even be considered, to avoid unnecessary work.
Using ROW syntax in UPDATE to keep the code simple. See:
SQL update fields of one table from fields of another one
The function returns two integer values: nullable_columns and updated_rows, reporting what the names suggest.
The function defends against SQL injection properly. See:
Table name as a PostgreSQL function parameter
SQL injection in Postgres functions vs prepared queries
About GET DIAGNOSTICS:
Calculate number of rows affected by batch query in PostgreSQL
The above function updates, but does not return rows. Here is a basic demo how to return rows of varying type:
CREATE OR REPLACE FUNCTION f_gap_fill_select(_tbl_type anyelement, _id text, _row_num text)
RETURNS SETOF anyelement
LANGUAGE plpgsql AS
$func$
DECLARE
_tbl regclass := pg_typeof(_tbl_type)::text::regclass;
_sql text;
BEGIN
SELECT INTO _sql
'SELECT ' || string_agg(CASE WHEN a.attnotnull
THEN format('%I', a.attname)
ELSE format('gap_fill(%1$I) OVER w AS %1$I', a.attname) END
, ', ' ORDER BY a.attnum)
|| E'\nFROM ' || _tbl
|| format(E'\nWINDOW w AS (PARTITION BY %I ORDER BY %I)', _id, _row_num)
FROM pg_attribute a
WHERE a.attrelid = _tbl
AND a.attnum > 0
AND NOT a.attisdropped;
IF _sql IS NULL THEN
RAISE EXCEPTION 'SQL string is NULL. Should not occur!';
END IF;
RETURN QUERY EXECUTE _sql;
-- RAISE NOTICE '%', _sql; -- debug
END
$func$;
Call (note special syntax!):
SELECT * FROM f_gap_fill_select(NULL::example, 'id', 'row_num');
db<>fiddle here
About returning a polymorphic row type:
Refactor a PL/pgSQL function to return the output of various SELECT queries

Adding a new column at certain place in Postgres [duplicate]

How to add a new column in a table after the 2nd or 3rd column in the table using postgres?
My code looks as follows
ALTER TABLE n_domains ADD COLUMN contract_nr int after owner_id
No, there's no direct way to do that. And there's a reason for it - every query should list all the fields it needs in whatever order (and format etc) it needs them, thus making the order of the columns in one table insignificant.
If you really need to do that I can think of one workaround:
dump and save the description of the table in question (using pg_dump --schema-only --table=<schema.table> ...)
add the column you want where you want it in the saved definition
rename the table in the saved definition so not to clash with the name of the old table when you attempt to create it
create the new table using this definition
populate the new table with the data from the old table using 'INSERT INTO <new_table> SELECT field1, field2, <default_for_new_field>, field3,... FROM <old_table>';
rename the old table
rename the new table to the original name
eventually drop the old, renamed table after you make sure everything's alright
The order of columns is not irrelevant, putting fixed width columns at the front of the table can optimize the storage layout of your data, it can also make working with your data easier outside of your application code.
PostgreSQL does not support altering the column ordering (see Alter column position on the PostgreSQL wiki); if the table is relatively isolated, your best bet is to recreate the table:
CREATE TABLE foobar_new ( ... );
INSERT INTO foobar_new SELECT ... FROM foobar;
DROP TABLE foobar CASCADE;
ALTER TABLE foobar_new RENAME TO foobar;
If you have a lot of views or constraints defined against the table, you can re-add all the columns after the new column and drop the original columns (see the PostgreSQL wiki for an example).
The real problem here is that it's not done yet. Currently PostgreSQL's logical ordering is the same as the physical ordering. That's problematic because you can't get a different logical ordering, but it's even worse because the table isn't physically packed automatically, so by moving columns you can get different performance characteristics.
Arguing that it's that way by intent in design is pointless. It's somewhat likely to change at some point when an acceptable patch is submitted.
All of that said, is it a good idea to rely on the ordinal positioning of columns, logical or physical? Hell no. In production code you should never be using an implicit ordering or *. Why make the code more brittle than it needs to be? Correctness should always be a higher priority than saving a few keystrokes.
As a work around, you can in fact modify the column ordering by recreating the table, or through the "add and reorder" game
See also,
Column tetris reordering in order to make things more space-efficient
The column order is relevant to me, so I created this function. See if it helps. It works with indexes, primary key, and triggers. Missing Views and Foreign Key and other features are missing.
Example:
SELECT xaddcolumn('table', 'col3 int NOT NULL DEFAULT 0', 'col2');
Source code:
CREATE OR REPLACE FUNCTION xaddcolumn(ptable text, pcol text, pafter text) RETURNS void AS $BODY$
DECLARE
rcol RECORD;
rkey RECORD;
ridx RECORD;
rtgr RECORD;
vsql text;
vkey text;
vidx text;
cidx text;
vtgr text;
ctgr text;
etgr text;
vseq text;
vtype text;
vcols text;
BEGIN
EXECUTE 'CREATE TABLE zzz_' || ptable || ' AS SELECT * FROM ' || ptable;
--colunas
vseq = '';
vcols = '';
vsql = 'CREATE TABLE ' || ptable || '(';
FOR rcol IN SELECT column_name as col, udt_name as coltype, column_default as coldef,
is_nullable as is_null, character_maximum_length as len,
numeric_precision as num_prec, numeric_scale as num_scale
FROM information_schema.columns
WHERE table_name = ptable
ORDER BY ordinal_position
LOOP
vtype = rcol.coltype;
IF (substr(rcol.coldef,1,7) = 'nextval') THEN
vtype = 'serial';
vseq = vseq || 'SELECT setval(''' || ptable || '_' || rcol.col || '_seq'''
|| ', max(' || rcol.col || ')) FROM ' || ptable || ';';
ELSIF (vtype = 'bpchar') THEN
vtype = 'char';
END IF;
vsql = vsql || E'\n' || rcol.col || ' ' || vtype;
IF (vtype in ('varchar', 'char')) THEN
vsql = vsql || '(' || rcol.len || ')';
ELSIF (vtype = 'numeric') THEN
vsql = vsql || '(' || rcol.num_prec || ',' || rcol.num_scale || ')';
END IF;
IF (rcol.is_null = 'NO') THEN
vsql = vsql || ' NOT NULL';
END IF;
IF (rcol.coldef <> '' AND vtype <> 'serial') THEN
vsql = vsql || ' DEFAULT ' || rcol.coldef;
END IF;
vsql = vsql || E',';
vcols = vcols || rcol.col || ',';
--
IF (rcol.col = pafter) THEN
vsql = vsql || E'\n' || pcol || ',';
END IF;
END LOOP;
vcols = substr(vcols,1,length(vcols)-1);
--keys
vkey = '';
FOR rkey IN SELECT constraint_name as name, column_name as col
FROM information_schema.key_column_usage
WHERE table_name = ptable
LOOP
IF (vkey = '') THEN
vkey = E'\nCONSTRAINT ' || rkey.name || ' PRIMARY KEY (';
END IF;
vkey = vkey || rkey.col || ',';
END LOOP;
IF (vkey <> '') THEN
vsql = vsql || substr(vkey,1,length(vkey)-1) || ') ';
END IF;
vsql = substr(vsql,1,length(vsql)-1) || ') WITHOUT OIDS';
--index
vidx = '';
cidx = '';
FOR ridx IN SELECT s.indexrelname as nome, a.attname as col
FROM pg_index i LEFT JOIN pg_class c ON c.oid = i.indrelid
LEFT JOIN pg_attribute a ON a.attrelid = c.oid AND a.attnum = ANY(i.indkey)
LEFT JOIN pg_stat_user_indexes s USING (indexrelid)
WHERE c.relname = ptable AND i.indisunique != 't' AND i.indisprimary != 't'
ORDER BY s.indexrelname
LOOP
IF (ridx.nome <> cidx) THEN
IF (vidx <> '') THEN
vidx = substr(vidx,1,length(vidx)-1) || ');';
END IF;
cidx = ridx.nome;
vidx = vidx || E'\nCREATE INDEX ' || cidx || ' ON ' || ptable || ' (';
END IF;
vidx = vidx || ridx.col || ',';
END LOOP;
IF (vidx <> '') THEN
vidx = substr(vidx,1,length(vidx)-1) || ')';
END IF;
--trigger
vtgr = '';
ctgr = '';
etgr = '';
FOR rtgr IN SELECT trigger_name as nome, event_manipulation as eve,
action_statement as act, condition_timing as cond
FROM information_schema.triggers
WHERE event_object_table = ptable
LOOP
IF (rtgr.nome <> ctgr) THEN
IF (vtgr <> '') THEN
vtgr = replace(vtgr, '_#eve_', substr(etgr,1,length(etgr)-3));
END IF;
etgr = '';
ctgr = rtgr.nome;
vtgr = vtgr || 'CREATE TRIGGER ' || ctgr || ' ' || rtgr.cond || ' _#eve_ '
|| 'ON ' || ptable || ' FOR EACH ROW ' || rtgr.act || ';';
END IF;
etgr = etgr || rtgr.eve || ' OR ';
END LOOP;
IF (vtgr <> '') THEN
vtgr = replace(vtgr, '_#eve_', substr(etgr,1,length(etgr)-3));
END IF;
--exclui velha e cria nova
EXECUTE 'DROP TABLE ' || ptable;
IF (EXISTS (SELECT sequence_name FROM information_schema.sequences
WHERE sequence_name = ptable||'_id_seq'))
THEN
EXECUTE 'DROP SEQUENCE '||ptable||'_id_seq';
END IF;
EXECUTE vsql;
--dados na nova
EXECUTE 'INSERT INTO ' || ptable || '(' || vcols || ')' ||
E'\nSELECT ' || vcols || ' FROM zzz_' || ptable;
EXECUTE vseq;
EXECUTE vidx;
EXECUTE vtgr;
EXECUTE 'DROP TABLE zzz_' || ptable;
END;
$BODY$ LANGUAGE plpgsql VOLATILE COST 100;
#Jeremy Gustie's solution above almost works, but will do the wrong thing if the ordinals are off (or fail altogether if the re-ordered ordinals make incompatible types match). Give it a try:
CREATE TABLE test1 (one varchar, two varchar, three varchar);
CREATE TABLE test2 (three varchar, two varchar, one varchar);
INSERT INTO test1 (one, two, three) VALUES ('one', 'two', 'three');
INSERT INTO test2 SELECT * FROM test1;
SELECT * FROM test2;
The results show the problem:
testdb=> select * from test2;
three | two | one
-------+-----+-------
one | two | three
(1 row)
You can remedy this by specifying the column names in the insert:
INSERT INTO test2 (one, two, three) SELECT * FROM test1;
That gives you what you really want:
testdb=> select * from test2;
three | two | one
-------+-----+-----
three | two | one
(1 row)
The problem comes when you have legacy that doesn't do this, as I indicated above in my comment on peufeu's reply.
Update: It occurred to me that you can do the same thing with the column names in the INSERT clause by specifying the column names in the SELECT clause. You just have to reorder them to match the ordinals in the target table:
INSERT INTO test2 SELECT three, two, one FROM test1;
And you can of course do both to be very explicit:
INSERT INTO test2 (one, two, three) SELECT one, two, three FROM test1;
That gives you the same results as above, with the column values properly matched.
The order of the columns is totally irrelevant in relational databases
Yes.
For instance if you use Python, you would do :
cursor.execute( "SELECT id, name FROM users" )
for id, name in cursor:
print id, name
Or you would do :
cursor.execute( "SELECT * FROM users" )
for row in cursor:
print row['id'], row['name']
But no sane person would ever use positional results like this :
cursor.execute( "SELECT * FROM users" )
for id, name in cursor:
print id, name
Well, it's a visual goody for DBA's and can be implemented to the engine with minor performance loss. Add a column order table to pg_catalog or where it's suited best. Keep it in memory and use it before certain queries. Why overthink such a small eye candy.
# Milen A. Radev
The irrelevant need from having a set order of columns is not always defined by the query that pulls them. In the values from pg_fetch_row does not include the associated column name and therefore would require the columns to be defined by the SQL statement.
A simple select * from would require innate knowledge of the table structure, and would sometimes cause issues if the order of the columns were to change.
Using pg_fetch_assoc is a more reliable method as you can reference the column names, and therefore use a simple select * from.

Count the number of null values into an Oracle table?

I need to count the number of null values of all the columns in a table in Oracle.
For instance, I execute the following statements to create a table TEST and insert data.
CREATE TABLE TEST
( A VARCHAR2(20 BYTE),
B VARCHAR2(20 BYTE),
C VARCHAR2(20 BYTE)
);
Insert into TEST (A) values ('a');
Insert into TEST (B) values ('b');
Insert into TEST (C) values ('c');
Now, I write the following code to compute the number of null values in the table TEST:
declare
cnt number :=0;
temp number :=0;
begin
for r in ( select column_name, data_type
from user_tab_columns
where table_name = upper('test')
order by column_id )
loop
if r.data_type <> 'NOT NULL' then
select count(*) into temp FROM TEST where r.column_name IS NULL;
cnt := cnt + temp;
END IF;
end loop;
dbms_output.put_line('Total: '||cnt);
end;
/
It returns 0, when the expected value is 6.
Where is the error?
Thanks in advance.
Counting NULLs for each column
In order to count NULL values for all columns of a table T you could run
SELECT COUNT(*) - COUNT(col1) col1_nulls
, COUNT(*) - COUNT(col2) col2_nulls
,..
, COUNT(*) - COUNT(colN) colN_nulls
, COUNT(*) total_rows
FROM T
/
Where col1, col2, .., colN should be replaced with actual names of columns of T table.
Aggregate functions -like COUNT()- ignore NULL values, so COUNT(*) - COUNT(col) will give you how many nulls for each column.
Summarize all NULLs of a table
If you want to know how many fields are NULL, I mean every NULL of every record you can
WITH d as (
SELECT COUNT(*) - COUNT(col1) col1_nulls
, COUNT(*) - COUNT(col2) col2_nulls
,..
, COUNT(*) - COUNT(colN) colN_nulls
, COUNT(*) total_rows
FROM T
) SELECT col1_nulls + col1_nulls +..+ colN_null
FROM d
/
Summarize all NULLs of a table (using Oracle dictionary tables)
Following is an improvement in which you need to now nothing but table name and it is very easy to code a function based on it
DECLARE
T VARCHAR2(64) := '<YOUR TABLE NAME>';
expr VARCHAR2(32767);
q INTEGER;
BEGIN
SELECT 'SELECT /*+FULL(T) PARALLEL(T)*/' || COUNT(*) || ' * COUNT(*) OVER () - ' || LISTAGG('COUNT(' || COLUMN_NAME || ')', ' + ') WITHIN GROUP (ORDER BY COLUMN_ID) || ' FROM ' || T
INTO expr
FROM USER_TAB_COLUMNS
WHERE TABLE_NAME = T;
-- This line is for debugging purposes only
DBMS_OUTPUT.PUT_LINE(expr);
EXECUTE IMMEDIATE expr INTO q;
DBMS_OUTPUT.PUT_LINE(q);
END;
/
Due to calculation implies a full table scan, code produced in expr variable was optimized for parallel running.
User defined function null_fields
Function version, also includes an optional parameter to be able to run on other schemas.
CREATE OR REPLACE FUNCTION null_fields(table_name IN VARCHAR2, owner IN VARCHAR2 DEFAULT USER)
RETURN INTEGER IS
T VARCHAR2(64) := UPPER(table_name);
o VARCHAR2(64) := UPPER(owner);
expr VARCHAR2(32767);
q INTEGER;
BEGIN
SELECT 'SELECT /*+FULL(T) PARALLEL(T)*/' || COUNT(*) || ' * COUNT(*) OVER () - ' || listagg('COUNT(' || column_name || ')', ' + ') WITHIN GROUP (ORDER BY column_id) || ' FROM ' || o || '.' || T || ' t'
INTO expr
FROM all_tab_columns
WHERE table_name = T;
EXECUTE IMMEDIATE expr INTO q;
RETURN q;
END;
/
-- Usage 1
SELECT null_fields('<your table name>') FROM dual
/
-- Usage 2
SELECT null_fields('<your table name>', '<table owner>') FROM dual
/
Thank you #Lord Peter :
The below PL/SQL script works
declare
cnt number :=0;
temp number :=0;
begin
for r in ( select column_name, nullable
from user_tab_columns
where table_name = upper('test')
order by column_id )
loop
if r.nullable = 'Y' then
EXECUTE IMMEDIATE 'SELECT count(*) FROM test where '|| r.column_name ||' IS NULL' into temp ;
cnt := cnt + temp;
END IF;
end loop;
dbms_output.put_line('Total: '||cnt);
end;
/
The table name test may be replaced the name of table of your interest.
I hope this solution is useful!
The dynamic SQL you execute (this is the string used in EXECUTE IMMEDIATE) should be
select sum(
decode(a,null,1,0)
+decode(b,null,1,0)
+decode(c,null,1,0)
) nullcols
from test;
Where each summand corresponds to a NOT NULL column.
Here only one table scan is necessary to get the result.
Use the data dictionary to find the number of NULL values almost instantly:
select sum(num_nulls) sum_num_nulls
from all_tab_columns
where owner = user
and table_name = 'TEST';
SUM_NUM_NULLS
-------------
6
The values will only be correct if optimizer statistics were gathered recently and if they were gathered with the default value for the sample size.
Those may seem like large caveats but it's worth becoming familiar with your database's statistics gathering process anyway. If your database is not automatically gathering statistics or if your database is not using the default sample size those are likely huge problems you need to be aware of.
To manually gather stats for a specific table a statement like this will work:
begin
dbms_stats.gather_table_stats(user, 'TEST');
end;
/
select COUNT(1) TOTAL from table where COLUMN is NULL;

Transposing a table through select query

I have a table like:
Key type value
---------------------
40 A 12.34
41 A 10.24
41 B 12.89
I want it in the format:
Types 40 41 42 (keys)
---------------------------------
A 12.34 10.24 XXX
B YYY 12.89 ZZZ
How can this be done through an SQL query. Case statements, decode??
What you're looking for is called a "pivot" (see also "Pivoting Operations" in the Oracle Database Data Warehousing Guide):
SELECT *
FROM tbl
PIVOT(SUM(value) FOR Key IN (40, 41, 42))
It was added to Oracle in 11g. Note that you need to specify the result columns (the values from the unpivoted column that become the pivoted column names) in the pivot clause. Any columns not specified in the pivot are implicitly grouped by. If you have columns in the original table that you don't wish to group by, select from a view or subquery, rather than from the table.
You can engage in a bit of wizardry and get Oracle to create the statement for you, so that you don't need to figure out what column values to pivot on. In 11g, when you know the column values are numeric:
SELECT
'SELECT * FROM tbl PIVOT(SUM(value) FOR Key IN ('
|| LISTAGG(Key, ',') WITHIN GROUP (ORDER BY Key)
|| ');'
FROM tbl;
If the column values might not be numeric:
SELECT
'SELECT * FROM tbl PIVOT(SUM(value) FOR Key IN (\''
|| LISTAGG(Key, '\',\'') WITHIN GROUP (ORDER BY Key)
|| '\'));'
FROM tbl;
LISTAGG probably repeats duplicates (would someone test this?), in which case you'd need:
SELECT
'SELECT * FROM tbl PIVOT(SUM(value) FOR Key IN (\''
|| LISTAGG(Key, '\',\'') WITHIN GROUP (ORDER BY Key)
|| '\'));'
FROM (SELECT DISTINCT Key FROM tbl);
You could go further, defining a function that takes a table name, aggregate expression and pivot column name that returns a pivot statement by first producing then evaluating the above statement. You could then define a procedure that takes the same arguments and produces the pivoted result. I don't have access to Oracle 11g to test it, but I believe it would look something like:
CREATE PACKAGE dynamic_pivot AS
-- creates a PIVOT statement dynamically
FUNCTION pivot_stmt (tbl_name IN varchar2(30),
pivot_col IN varchar2(30),
aggr IN varchar2(40),
quote_values IN BOOLEAN DEFAULT TRUE)
RETURN varchar2(300);
PRAGMA RESTRICT_REFERENCES (pivot_stmt, WNDS, RNPS);
-- creates & executes a PIVOT
PROCEDURE pivot_table (tbl_name IN varchar2(30),
pivot_col IN varchar2(30),
aggr IN varchar2(40),
quote_values IN BOOLEAN DEFAULT TRUE);
END dynamic_pivot;
CREATE PACKAGE BODY dynamic_pivot AS
FUNCTION pivot_stmt (
tbl_name IN varchar2(30),
pivot_col IN varchar2(30),
aggr_expr IN varchar2(40),
quote_values IN BOOLEAN DEFAULT TRUE
) RETURN varchar2(300)
IS
stmt VARCHAR2(400);
quote VARCHAR2(2) DEFAULT '';
BEGIN
IF quote_values THEN
quote := '\\\'';
END IF;
-- "\||" shows that you are still in the dynamic statement string
-- The input fields aren't sanitized, so this is vulnerable to injection
EXECUTE IMMEDIATE 'SELECT \'SELECT * FROM ' || tbl_name
|| ' PIVOT(' || aggr_expr || ' FOR ' || pivot_col
|| ' IN (' || quote || '\' \|| LISTAGG(' || pivot_col
|| ', \'' || quote || ',' || quote
|| '\') WITHIN GROUP (ORDER BY ' || pivot_col || ') \|| \'' || quote
|| '));\' FROM (SELECT DISTINCT ' || pivot_col || ' FROM ' || tbl_name || ');'
INTO stmt;
RETURN stmt;
END pivot_stmt;
PROCEDURE pivot_table (tbl_name IN varchar2(30), pivot_col IN varchar2(30), aggr_expr IN varchar2(40), quote_values IN BOOLEAN DEFAULT TRUE) IS
BEGIN
EXECUTE IMMEDIATE pivot_stmt(tbl_name, pivot_col, aggr_expr, quote_values);
END pivot_table;
END dynamic_pivot;
Note: the length of the tbl_name, pivot_col and aggr_expr parameters comes from the maximum table and column name length. Note also that the function is vulnerable to SQL injection.
In pre-11g, you can apply MySQL pivot statement generation techniques (which produces the type of query others have posted, based on explicitly defining a separate column for each pivot value).
Pivot does simplify things greatly. Before 11g however, you need to do this manually.
select
type,
sum(case when key = 40 then value end) as val_40,
sum(case when key = 41 then value end) as val_41,
sum(case when key = 42 then value end) as val_42
from my_table
group by type;
Never tried it but it seems at least Oracle 11 has a PIVOT clause
If you do not have access to 11g, you can utilize a string aggregation and a grouping method to approx. what you are looking for such as
with data as(
SELECT 40 KEY , 'A' TYPE , 12.34 VALUE FROM DUAL UNION
SELECT 41 KEY , 'A' TYPE , 10.24 VALUE FROM DUAL UNION
SELECT 41 KEY , 'B' TYPE , 12.89 VALUE FROM DUAL
)
select
TYPE ,
wm_concat(KEY) KEY ,
wm_concat(VALUE) VALUE
from data
GROUP BY TYPE;
type KEY VALUE
------ ------- -----------
A 40,41 12.34,10.24
B 41 12.89
This is based on wm_concat as shown here: http://www.oracle-base.com/articles/misc/StringAggregationTechniques.php
I'm going to leave this here just in case it helps, but I think PIVOT or MikeyByCrikey's answers would best suit your needs after re-looking at your sample results.

Slow Postgres Query

I'm new to Postgres and SQL. I created the following script that draws a line from a point to a projected point on the nearest line. It works fine on a small data set 5 to 10 points with the same number of lines; however, doing it on 60 points with 2,000 lines, the query takes about 12 hours. it is based on a nearest neighbour function pasted below as well from http://www.bostongis.com/downloads/pgis_nn.txt
EDIT documentation on pgis_fn_nn is available on http://www.bostongis.com/PrinterFriendly.aspx?content_name=postgis_nearest_neighbor_generic
The slow part is the implementation of pgis_fn_nn(...)
What am I doing wrong?
Are there any tips to make this faster ?
Is there a way I can improve on both scripts?
What would you recommend if I want to combine both queries into one?
my_script.sql
-- this sql script creates a line table that connects points from a point table
-- to the projected points from the nearest line to the point of oritin
-- delete duplicate tables if they exist
DROP TABLE exploded_roads;
DROP TABLE projected_points;
DROP TABLE lines_from_centroids_to_roads;
-- create temporary exploaded lines table
CREATE TABLE exploded_roads (
the_geom geometry,
edge_id serial
);
-- insert the linestring that are not multistring
INSERT INTO exploded_roads
SELECT the_geom
FROM "StreetCenterLines"
WHERE st_geometrytype(the_geom) = 'ST_LineString';
-- insert the linestrings that need to be converted from multi string
INSERT INTO exploded_roads
SELECT the_geom
FROM (
SELECT ST_GeometryN(
the_geom,
generate_series(1, ST_NumGeometries(the_geom)))
AS the_geom
FROM "StreetCenterLines"
)
AS foo;
-- create projected points table with ids matching centroid table
CREATE TABLE projected_points (
the_geom geometry,
pid serial,
dauid int
);
-- Populate Table
-- code based on Paul Ramsey's site and Boston GIS' NN code
INSERT INTO projected_points(the_geom, dauid)
SELECT DISTINCT ON ("DAUID"::int)
(
ST_Line_Interpolate_Point(
(
SELECT the_geom
FROM exploded_roads
WHERE edge_id IN
(
SELECT nn_gid
FROM pgis_fn_nn(centroids.the_geom, 30000000, 1,10, 'exploded_roads', 'true', 'edge_id', 'the_geom')
)
),
ST_Line_Locate_Point(
exploded_roads.the_geom,
centroids.the_geom
)
)
),
(centroids."DAUID"::int)
FROM exploded_roads, fred_city_o6_da_centroids centroids;
-- Create Line tables
CREATE TABLE lines_from_centroids_to_roads (
the_geom geometry,
edge_id SERIAL
);
-- Populate Line Table
INSERT INTO lines_from_centroids_to_roads(
SELECT
ST_MakeLine( centroids.the_geom, projected_points.the_geom )
FROM projected_points, fred_city_o6_da_centroids centroids
WHERE projected_points.dauid = centroids.id
);
pgis_fn_nn from http://www.bostongis.com/downloads/pgis_nn.txt
---LAST UPDATED 8/2/2007 --
CREATE OR REPLACE FUNCTION expandoverlap_metric(a geometry, b geometry, maxe double precision, maxslice double precision)
RETURNS integer AS
$BODY$
BEGIN
FOR i IN 0..maxslice LOOP
IF expand(a,maxe*i/maxslice) && b THEN
RETURN i;
END IF;
END LOOP;
RETURN 99999999;
END;
$BODY$
LANGUAGE 'plpgsql' IMMUTABLE;
CREATE TYPE pgis_nn AS
(nn_gid integer, nn_dist numeric(16,5));
CREATE OR REPLACE FUNCTION _pgis_fn_nn(geom1 geometry, distguess double precision, numnn integer, maxslices integer, lookupset varchar(150), swhere varchar(5000), sgid2field varchar(100), sgeom2field varchar(100))
RETURNS SETOF pgis_nn AS
$BODY$
DECLARE
strsql text;
rec pgis_nn;
ncollected integer;
it integer;
--NOTE: it: the iteration we are currently at
--start at the bounding box of the object (expand 0) and move up until it has collected more objects than we need or it = maxslices whichever event happens first
BEGIN
ncollected := 0; it := 0;
WHILE ncollected < numnn AND it <= maxslices LOOP
strsql := 'SELECT currentit.' || sgid2field || ', distance(ref.geom, currentit.' || sgeom2field || ') as dist FROM ' || lookupset || ' as currentit, (SELECT geometry(''' || CAST(geom1 As text) || ''') As geom) As ref WHERE ' || swhere || ' AND distance(ref.geom, currentit.' || sgeom2field || ') <= ' || CAST(distguess As varchar(200)) || ' AND expand(ref.geom, ' || CAST(distguess*it/maxslices As varchar(100)) || ') && currentit.' || sgeom2field || ' AND expandoverlap_metric(ref.geom, currentit.' || sgeom2field || ', ' || CAST(distguess As varchar(200)) || ', ' || CAST(maxslices As varchar(200)) || ') = ' || CAST(it As varchar(100)) || ' ORDER BY distance(ref.geom, currentit.' || sgeom2field || ') LIMIT ' ||
CAST((numnn - ncollected) As varchar(200));
--RAISE NOTICE 'sql: %', strsql;
FOR rec in EXECUTE (strsql) LOOP
IF ncollected < numnn THEN
ncollected := ncollected + 1;
RETURN NEXT rec;
ELSE
EXIT;
END IF;
END LOOP;
it := it + 1;
END LOOP;
END
$BODY$
LANGUAGE 'plpgsql' STABLE;
CREATE OR REPLACE FUNCTION pgis_fn_nn(geom1 geometry, distguess double precision, numnn integer, maxslices integer, lookupset varchar(150), swhere varchar(5000), sgid2field varchar(100), sgeom2field varchar(100))
RETURNS SETOF pgis_nn AS
$BODY$
SELECT * FROM _pgis_fn_nn($1,$2, $3, $4, $5, $6, $7, $8);
$BODY$
LANGUAGE 'sql' STABLE;
I am using "the nearest" function to do pgRouting on OpenStreetMap data. At first I stumbled upon the fn_nn function you mention too, but a visit to the #postgis irc channel on irc.freenode.net helped me out. It turns out postgis has some fantastic linear functions that, when combined, answer everything you need!
You can find more on the linear functions at: http://postgis.refractions.net/documentation/manual-1.3/ch06.html#id2578698 but here is how I implemented it
select
line_interpolate_point(ways.the_geom,
line_locate_point(ways.the_geom, pnt))),')','')) as anchor_point,
-- returns the anchor point
line_locate_point(ways.the_geom, pnt) as anchor_percentage,
-- returns the percentage on the line where the anchor will
-- touch (number between 0 and 1)
CASE
WHEN line_locate_point(ways.the_geom, pnt) < 0.5 THEN ways.source
WHEN line_locate_point(ways.the_geom, pnt) > 0.5 THEN ways.target
END as node,
-- returns the nearest end node id
length_spheroid( st_line_substring(ways.the_geom,0,
line_locate_point(ways.the_geom, pnt)),
'SPHEROID[\"WGS 84\",6378137,298.257223563]' ) as length,
distance_spheroid(pnt, line_interpolate_point(ways.the_geom,
line_locate_point(ways.the_geom, pnt)),
'SPHEROID[\"WGS 84\",6378137,298.257223563]') as dist
from ways, planet_osm_line,
ST_GeomFromText('POINT(1.245 51.234)', 4326) as pnt
where ways.gid = planet_osm_line.osm_id
order by dist asc limit 1;";
Hope this is of any use to you