Select a dynamic set of columns from a table and get the sum for each - sql

Is it possible to do the following in Postgres:
SELECT column_name FROM information_schema WHERE table_name = 'somereport' AND data_type = 'integer';
SELECT SUM(coulmn_name[0]),SUM(coulmn_name[1]) ,SUM(coulmn_name[3]) FROM somereport;
In other words I need to select a group of columns from a table depending on certain criteria, and then sum each of those columns in the table.
I know I can do this in a loop, so I can count each column independently, but obviously that requires a query for each column returned from the information schema query. Eg:
FOR r IN select column_name from information_schema where report_view_name = 'somereport' and data_type = 'integer';
LOOP
SELECT SUM(r.column_name) FROM somereport;
END

This query creates the complete DML statement you are after:
WITH x AS (
SELECT 'public'::text AS _schema -- provide schema name ..
,'somereport'::text AS _tbl -- .. and table name once
)
SELECT 'SELECT ' || string_agg('sum(' || quote_ident(column_name)
|| ') AS sum_' || quote_ident(column_name), ', ')
|| E'\nFROM ' || quote_ident(x._schema) || '.' || quote_ident(x._tbl)
FROM x, information_schema.columns
WHERE table_schema = _schema
AND table_name = _tbl
AND data_type = 'integer'
GROUP BY x._schema, x._tbl;
You can execute it separately or wrap this query in a plpgsql function and run the query automatically with EXECUTE:
Full automation
Tested with PostgreSQL 9.1.4
CREATE OR REPLACE FUNCTION f_get_sums(_schema text, _tbl text)
RETURNS TABLE(names text[], sums bigint[]) AS
$BODY$
BEGIN
RETURN QUERY EXECUTE (
SELECT 'SELECT ''{'
|| string_agg(quote_ident(c.column_name), ', ' ORDER BY c.column_name)
|| '}''::text[],
ARRAY['
|| string_agg('sum(' || quote_ident(c.column_name) || ')'
, ', ' ORDER BY c.column_name)
|| ']
FROM '
|| quote_ident(_schema) || '.' || quote_ident(_tbl)
FROM information_schema.columns c
WHERE table_schema = _schema
AND table_name = _tbl
AND data_type = 'integer'
);
END;
$BODY$
LANGUAGE plpgsql;
Call:
SELECT unnest(names) AS name, unnest (sums) AS col_sum
FROM f_get_sums('public', 'somereport');
Returns:
name | col_sum
---------------+---------
int_col1 | 6614
other_int_col | 8364
third_int_col | 2720642
Explain
The difficulty is to define the RETURN type for the function, while number and names of columns returned will vary. One detail that helps a little: you only want integer columns.
I solved this by forming an array of bigint (sum(int_col) returns bigint). In addition I return an array of column names. Both sorted alphabetically by column name.
In the function call I split up these arrays with unnest() arriving at the handsome format displayed.
The dynamically created and executed query is advanced stuff. Don't get confused by multiple layers of quotes. Basically you have EXECUTE that takes a text argument containing the SQL query to execute. This text, in turn, is provided by secondary SQL query that builds the query string of the primary query.
If this is too much at once or plpgsql is rather new for you, start with this related answer where I explain the basics dealing with a much simpler function and provide links to the manual for the major features.
If performance is essential query the Postgres catalog directly (pg_catalog.pg_attributes) instead of using the standardized (but slow) information_schema.columns. Here is a simple example with pg_attributes.

Related

Sum all numeric columns in database and log results

I have a query which gives me all numeric columns in my Postgres database:
SELECT table_schema, table_name, column_name
FROM information_schema.columns
WHERE table_schema in (
'datawarehouse_x',
'datawarehouse_y',
'datawarehouse_z',
'datawarehouse_w'
)
and udt_name not in
('date','timestamp','bool','varchar')
and column_name not like '%_id'
This gives me, what I need:
table_schema table_name column_name
schema_1 table_x column_z
schema_2 table_y column_w
I checked it and it's fine.
What I do now want to do is, to query all these columns for each table as a select sum(column) and then insert this schema_name, table_name, query_result and the current date into a log table on a daily basis.
Writing the results into a target table shouldn't be a big deal, but how in the world can I run queries according to the results of this query?
Thanks in advance.
EDIT: What I will write afterwards would be a procedure, which takes these schema/table/column as input, then queries the table and writes into the log-table. I just do not know the part in-between. This is kind of what I would be doing then, but I don't know yet which types I should use for schema, table and column.
create or replace function sandbox.daily_routine_metrics(schema_name regnamespace, table_name regclass, column_name varchar)
returns void
language plpgsql
as $$
BEGIN
EXECUTE
'INSERT INTO LOGGING.DAILY_ROUTINE_SIZE
SELECT
'|| QUOTE_LITERAL(schema_name) ||' schema_name,' ||
QUOTE_LITERAL(table_name) ||' table_name, ' ||
QUOTE_LITERAL(column_name) ||' column_name, ' ||
'current_timestamp, sum(' || QUOTE_LITERAL(column_name) || ')
FROM ' || QUOTE_LITERAL(schema_name) ||'.'|| QUOTE_LITERAL(table_name);
END;
$$;
The feature you need is known as "dynamic SQL". It's an RDBMS-specific implementation; the documents for Postgres are here.
Whilst it's possible to achieve what you want in dynamic SQL, you might find it easier to use a scripting language like Python or Ruby to achieve this. Dynamic SQL is hard to code and debug - you find yourself concatenating lots of hardcoded strings with results from SQL queries, printing them to the console to see if they work, and realizing all sorts of edge cases blow up.

Change all columns in table of a certain data type in PostgreSQL 9.6

It seems like several months ago I came across a SO question covering this but I can't seem to find it now.
Basically, I want to do two things.
First, a number of tables were made with several columns numeric(20,2) and I want to just change them all to numeric. The statement is simple enough for one column:
ALTER TABLE table_name
ALTER COLUMN code
TYPE numeric;
Takes care of that.
Second, on these columns I want to remove any trailing zeros:
UPDATE table_name
SET code = replace(replace(code::text, '.50', '.5'), '.00', '')::numeric;
Having difficulty figuring out how to automate it so I only have to specify the table and it will clean up the table. Pretty sure this is possible.
You can find all of the columns with the data type that you want to change with a statement like:
select column_name, table_name
from information_schema.columns
where data_type='numeric'
and numeric_precision = 20
and numeric_scale = 2;
You can iterate over the result with a custom function or with a DO command such as:
do $$
declare
t record;
begin
for t IN select column_name, table_name
from information_schema.columns
where data_type='numeric'
and numeric_precision = 20
and numeric_scale = 2;
loop
execute 'alter table ' || t.table_name || ' alter column ' || t.column_name || ' type numeric';
end loop;
end$$;
Also, to remove trailing zeroes, a more general solution is to cast the value to float or double precision and then back to numeric, e.g:
set code = cast(cast(code as double precision) as numeric);

Renaming multiple columns in PostgreSQL

My table has a bunch of columns in the following format:
_settingA
_settingB
_settingB
And I want to rename them simply to add a prefix as follows:
_1_settingA
_1_settingB
_1_settingC
I have a lot more than three columns to rename in this way. If I had just three, I'd just do it manually one by one.
What is the quickest / most efficient way to achieve this?
There's no single command aproach. Obviously you could type multiple comands for RENAME by your self, but let me intoduce some improvement:) As I said in this answer
...for all such bulk-admin-operations you could use PostgreSQL system tables to generate queries for you instead of writing them by hand
In your case it would be:
SELECT
'ALTER TABLE ' || tab_name || ' RENAME COLUMN '
|| quote_ident(column_name) || ' TO '
|| quote_ident( '_1' || column_name) || ';'
FROM (
SELECT
quote_ident(table_schema) || '.' || quote_ident(table_name) as tab_name,
column_name
FROM information_schema.columns
WHERE
table_schema = 'schema_name'
AND table_name = 'table_name'
AND column_name LIKE '\_%'
) sub;
That'll give you set of strings which are SQL commands like:
ALTER TABLE schema_name.table_name RENAME COLUMN "_settingA" TO "_1_settingA";
ALTER TABLE schema_name.table_name RENAME COLUMN "_settingB" TO "_1_settingB";
...
There no need using table_schema in WHERE clause if your table is in public schema. Also remember using function quote_ident() -- read my original answer for more explanation.
Edit:
I've change my query so now it works for all columns with name begining with underscore _. Because underscore is special character in SQL pattern matching, we must escape it (using \) to acctually find it.
Something simple like this will work.
SELECT FORMAT(
'ALTER TABLE %I.%I.%I RENAME %I TO %I;',
table_catalog,
table_schema,
table_name,
column_name,
'_PREFIX_' + column_name
)
FROM information_schema.columns
WHERE table_name = 'foo';
%I will do quote_ident, which is substantially nicer. If you're in PSQL you can run it with \gexec
You can use the following function :
(I use this to add prefix on tables wiches have more than 50 columns)
First create the function :
CREATE OR REPLACE FUNCTION rename_cols( schema_name_ text,table_name_ text, prefix varchar(4))
RETURNS bool AS
$BODY$
DECLARE
rec_selection record;
BEGIN
FOR rec_selection IN (
SELECT column_name FROM information_schema.columns WHERE table_schema = schema_name_ AND table_name = table_name_) LOOP
EXECUTE 'ALTER TABLE '||schema_name_||'.'||table_name_||' RENAME COLUMN "'|| rec_selection.column_name ||'" TO "'||prefix|| rec_selection.column_name ||'" ;';
END LOOP;
RETURN True;
END;
$BODY$
LANGUAGE plpgsql VOLATILE
COST 100;
Then execute function :
SELECT rename_cols('public','test','d');
Hope it will be usefull,
You can't do that.
All the actions except RENAME and SET SCHEMA can be combined into a list of multiple alterations to apply in parallel.
most efficient way is using ActiveRecord.

Check a whole table for a single value

Background: I'm converting a database table to a format that doesn't support null values. I want to replace the null values with an arbitrary number so my application can support null values.
Question: I'd like to search my whole table for a value ("999999", for example) to make sure that it doesn't appear in the table. I could write a script to test each column individually, but I wanted to know if there is a way I could do this in pure sql without enumerating each field. Is that possible?
You can use a special feature of the PostgreSQL type system:
SELECT *
FROM tbl t
WHERE t::text LIKE '%999999%';
There is a composite type of the same name for every table that you create in PostgreSQL. And there is a text representation for every type in PostgreSQL (to input / output values).
Therefore you can just cast the whole row to text and if the string '999999' is contained in any column (its text representation, to be precise) it is guaranteed to show in the query above.
You cannot rule out false positives completely, though, if separators and / or decorators used by Postgres for the row representation can be part of the search term. It's just very unlikely. And positively not the case for your search term '999999'.
There was a very similar question on codereview.SE recently. I added some more explanation in my answer there.
create or replace function test_values( real ) returns setof record as
$$
declare
query text;
output record;
begin
for query in select 'select distinct ''' || table_name || '''::text table_name, ''' || column_name || '''::text column_name from '|| quote_ident(table_name)||' where ' || quote_ident(column_name) || ' = ''' || $1::text ||'''::' || data_type from information_schema.columns where table_schema='public' and numeric_precision is not null
loop
raise notice '%1 qqqq', query;
execute query::text into output;
return next output;
end loop;
return;
end;$$ language plpgsql;
select distinct * from test_values( 999999 ) as t(table_name text ,column_name text)

How can I find columns which have non-null values?

I have many columns in oracle database and some new are added with values. I like to find out which columns have values other than 0 or null. So I am looking for column names for which some sort of useful values exists at least in one row.
How do I do this?
Update: This sounds very close. How do I modify this to suit my needs?
select column_name, nullable, num_distinct, num_nulls
from all_tab_columns
where table_name = 'SOME_TABLE'
You can query all the columns using the dba_tab_cols view and then see if there are columns which have values other than 0 or null.
create or replace function f_has_null_rows(
i_table_name in dba_tab_cols.table_name%type,
i_column_name in dba_tab_cols.table_name%type
) return number is
v_sql varchar2(200);
v_count number;
begin
v_sql := 'select count(*) from ' || i_table_name ||
' where ' || i_column_name ' || ' is not null and '
|| i_column_name || ' <>0 ';
execute immediate v_sql into v_count;
return v_count;
end;
/
select table_name, column_name from dba_tab_Cols
where f_has_null_rows (table_name, column_name) > 0
If you have synonyms in some schemas, you mighty find some of the tables are repeated. You'll have to change the code to cater to that.
Also, the check "is not equal to zero" might not be valid for columns that are not integers and will give errors if columns are of date datatype. You'll need to add the conditions for those cases. use the Data_type column in dba_tab_cols and add the condition as needed.
Select Column_name
from user_tab_columns
where table_name='EMP' and num_nulls=0;
This finds columns which does not have any values so you can perform any actions to that.
Sorry, I misread the question the first time.
From this post on Oracle's forums
Assuming your stats are up to date:
SELECT t.table_name,
t.column_name
FROM user_tab_columns t
WHERE t.nullable = 'Y'
AND t.num_distinct = 0;
Will return you a list of table names and columns that are null. You might want to add something like:
AND t.table_name = upper('Your_table_name')
in there to limit the results to just your table.
select 'cats' as mycolumname from T
where exists (Select id from T where cats is not null)
union
select 'dogs' as mycolumnname from T
where exists (select id from T where dogs is not null)
# ad nauseam
is how to do it in SQL. EDIT: Different flavors of SQL might let you optimize with LIMIT or TOP 'n' in the subquery. Or maybe they're even smart enough to realize that EXIST() only needs one row and optimize silently/transparently. P.S. Add your test for zero to the subquery.