Change all columns in table of a certain data type in PostgreSQL 9.6 - sql

It seems like several months ago I came across a SO question covering this but I can't seem to find it now.
Basically, I want to do two things.
First, a number of tables were made with several columns numeric(20,2) and I want to just change them all to numeric. The statement is simple enough for one column:
ALTER TABLE table_name
ALTER COLUMN code
TYPE numeric;
Takes care of that.
Second, on these columns I want to remove any trailing zeros:
UPDATE table_name
SET code = replace(replace(code::text, '.50', '.5'), '.00', '')::numeric;
Having difficulty figuring out how to automate it so I only have to specify the table and it will clean up the table. Pretty sure this is possible.

You can find all of the columns with the data type that you want to change with a statement like:
select column_name, table_name
from information_schema.columns
where data_type='numeric'
and numeric_precision = 20
and numeric_scale = 2;
You can iterate over the result with a custom function or with a DO command such as:
do $$
declare
t record;
begin
for t IN select column_name, table_name
from information_schema.columns
where data_type='numeric'
and numeric_precision = 20
and numeric_scale = 2;
loop
execute 'alter table ' || t.table_name || ' alter column ' || t.column_name || ' type numeric';
end loop;
end$$;
Also, to remove trailing zeroes, a more general solution is to cast the value to float or double precision and then back to numeric, e.g:
set code = cast(cast(code as double precision) as numeric);

Related

Calculate Avg in for loop for columns in a table in PostgreSQL

I come from the Python world, where many things are colorful and easy. Now I'm trying to make my way into SQL, because well, I want to challenge myself outside of pandas, and gain the important experience in SQL.
That said, I have the following question.
I have the following snippet:
do
$do$
declare i varchar(50);
declare average int;
begin
for i in (
select column_name
FROM information_schema.columns
where table_schema = 'public'
and table_name = 'example_table'
and column_name like '%suffix') loop
--raise notice 'Value: %', i;
select AVG(i) as average from example_table;
raise notice 'Value: %', i;
end loop;
end;
$do$
As I learned in the documentation for SQL, I found that for loops are only possible in a do block, and that certain variables have to be declared. I did this for the i variable which contains the name of the column I want to iterate. But I want to get the average of the column and add it as a row in a table with two columns one for the feature (i variable), and the average for this column. I thought that would be possible with my code snippet above, but I receive an error message that says that Function avg(character varying) does not exist.
When I use the function AVG outside of a for loop for a single column, it does retrieve the average value of this numeric column, but when I do it in a for loop, says that this aggregate function does not exists.
Could someone help me out with this please?
UPDATE:
I was taking a step back and trying to make the story shorter:
select column_name
FROM information_schema.columns
where table_schema = 'public'
and table_name = 'my_table'
and column_name like '%wildcard';
This snippet yields a table with a column called column_name and all the
columns that fullfil the constraints stated in the where statement.
I just want to add a column with the average value of those columns.
If you only need it for a single table, you can use:
select x.col, avg(x.value::numeric)
from example_table t
cross join lateral (
select col, value
from jsonb_each(to_jsonb(t)) as e(col, value)
where jsonb_typeof(e.value) = 'number'
) x
group by x.col;
The "magic" is in converting each row from the table into a JSON value. This is what to_jsonb(t) does (t is the alias given to the table in the main query). So we get something like {"name": "Bla", "value": 3.14, "length": 10, "some_date": "2022-03-02"}. So each column name is a key in the JSON value.
This json is then turned into one row per column (=key) using the jsonb_each() function but only rows (=columns) that have a number value are retained. So the derived table returns one row per column and row in the table. The outer query then simply aggregates this per column. The drawback is, you need to write one query for each table.
If you need some kind of report for all tables in a schema, you can use a variation of this answer
with all_selects as (
select table_schema, table_name, 'select '||string_agg(format('avg(%I) as %I', column_name, column_name), ', ')||format(' from %I.%I', table_schema, table_name) as query
from information_schema.columns
where table_schema = 'public'
and data_type in ('bigint', 'integer', 'double precision', 'smallint', 'numeric', 'real')
group by table_schema, table_name
), all_aggregates as (
select table_schema, table_name,
query_to_xml(query, true, true, '') as result
from all_selects
)
select ag.table_schema, ag.table_name, r.column_name, nullif(r.average, '')::numeric as average
from all_aggregates ag
cross join xmltable('/row/*' passing result
columns column_name text path 'local-name()',
average text path '.') as r
This is a bit more tricky. The first part all_selects builds a query for each table in the schema public to apply the avg() aggregate on each column that can contain a number (where data type in (...))
So e.g. this returns a string select avg(value) as value, avg(length) as length from example_table
The next step is running each of these queries through query_to_xml() (sadly there is no built-in query_to_jsonb()).
query_to_xml() would return something like:
<row xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<value>12.345</balance>
<length>42</user_id>
</row>
So one tag for each column (which is the result of the avg(..) function).
The final select then uses xmltable() to turn each tag from the XML result into a row returning the column name and value
Online example
Of course you can do this in PL/pgSQL as well:
do
$do$
declare
l_rec record;
l_sql text;
l_average numeric;
begin
for l_rec in
select table_schema, table_name, column_name
from information_schema.columns
where table_schema = 'public'
and data_type in ('bigint', 'integer', 'double precision', 'smallint', 'numeric', 'real')
loop
l_sql := format('select avg(%I) from %I.%I', l_rec.column_name, l_rec.table_schema, l_rec.table_name);
execute l_sql
into l_average;
raise notice 'Average for %.% is: %', l_rec.table_name, l_rec.column_name, l_average;
end loop;
end;
$do$
Note condition on the column data_type to only process columns that can be averaged. This is however more costly as it runs one query per column, not per table.

How to find non-numeric columns containing only numeric data?

I like to find all columns in my Oracle database schema that only contain numeric data but having a non-numeric type. (So basically column-candidates with probably wrong chosen data types.)
I have a query for all varchar2-columns:
SELECT TABLE_NAME, COLUMN_NAME, DATA_TYPE
FROM user_tab_cols
WHERE DATA_TYPE = 'VARCHAR2';
Furthermore I have a query to check for any non-numeric data inside a table myTable and a column myColumn:
SELECT 1
FROM myTable
WHERE NOT REGEXP_LIKE(myColumn, '^[[:digit:]]+$');
I like to combine both queries in that way that the first query only returns the rows where not exists the second.
The main problem here is that the first query is on meta layer of the data dictionary where TABLE_NAME and COLUMN_NAME comes as data and I need that data as identifiers (and not as data) in the second query.
In pseudo-SQL I have something like that in mind:
SELECT TABLE_NAME, COLUMN_NAME, DATA_TYPE
FROM user_tab_cols
WHERE DATA_TYPE = 'VARCHAR2'
AND NOT EXISTS
(SELECT 1 from asIdentifier(TABLE_NAME)
WHERE NOT REGEXP_LIKE(asIdentifier(COLUMN_NAME), '^[[:digit:]]+$'));
Create a function as this:
create or replace function isNumeric(val in VARCHAR2) return INTEGER AS
res NUMBER;
begin
res := TO_NUMBER(val);
RETURN 1;
EXCEPTION
WHEN OTHERS THEN
RETURN 0;
END;
Then you can use it like this:
DECLARE
r integer;
BEGIN
For aCol in (SELECT TABLE_NAME, COLUMN_NAME FROM user_tab_cols WHERE DATA_TYPE = 'VARCHAR2') LOOP
-- What about CHAR and CLOB data types?
execute immediate 'select count(*) from '||aCol.TABLE_NAME||' WHERE isNumeric('||aCol.COLUMN_NAME||') = 0' into r;
if r = 0 then
DBMS_OUTPUT.put_line(aCol.TABLE_NAME ||' '||aCol.COLUMN_NAME ||' contains numeric values only');
end if;
end loop;
end;
Note, the performance of this PL/SQL block will be poor. Hopefully this is a one-time-job only.
There are two possible approaches: dynamic SQL (DSQL) and XML.
First one was already demonstrated in another reply and it's faster.
XML approach just for fun
create or replace function to_number_udf(p in varchar2) return number
deterministic is
pragma udf;
begin
return p * 0;
exception when invalid_number or value_error then return 1;
end to_number_udf;
/
create table t_chk(str1, str2) as
select '1', '2' from dual union all
select '0001.1000', 'helloworld' from dual;
SQL> column owner format a20
SQL> column table_name format a20
SQL> column column_name format a20
SQL> with tabs_to_check as
2 (
3 select 'collection("oradb:/'||owner||'/'||table_name||'")/ROW/'||column_name||'/text()' x,
4 atc.*
5 from all_tab_columns atc
6 where table_name = 'T_CHK'
7 and data_type = 'VARCHAR2'
8 and owner = user
9 )
10 select --+ no_query_transformation
11 owner, table_name, column_name
12 from tabs_to_check ttc, xmltable(x columns "." varchar2(4000)) x
13 group by owner, table_name, column_name
14 having max(to_number_udf(".")) = 0;
OWNER TABLE_NAME COLUMN_NAME
-------------------- -------------------- --------------------
TEST T_CHK STR1
PS. On Oracle 12.2 you can use to_number(... default ... on conversion error) instead of UDF.
The faster way to check if a string is all digits vs. contains at least one non-digit character is to use the translate function. Alas, due to the non-SQL Standard way Oracle handles empty strings, the form of the function we must use is a little complicated:
translate(input_string, 'z0123456789', 'z')
(z can be any non-digit character; we need it so that the third argument is not null). This works by translating z to itself and 0, etc. to nothing. So if the input string was null or all-digits, and ONLY in that case, the value returned by the function is null.
In addition: to make the process faster, you can test each column with an EXISTS condition. If a column is not meant to be numeric, then in most cases the EXISTS condition will become true very quickly, so you will have to inspect a very small number of values from such columns.
As I tried to make this work, I ran into numerous side issues. Presumably you want to look in all schemas (except SYS and perhaps SYSTEM). So you need to run the procedure (anonymous block) from an account with SYSDBA privileges. Then - I ran into issues with non-standard table and column names (names starting with an underscore and such); which brought to mind identifiers defined in double-quotes - a terrible practice.
For illustration, I will use the HR schema - on which the approach worked. You may need to tweak this further; I wasn't able to make it work by changing the line
and owner = 'HR'
to
and owner != 'SYS'
So - with this long intro - here is what I did.
First, in a "normal" user account (my own, named INTRO - I run a very small database, with only one "normal" user, plus the Oracle "standard" users like SCOTT, HR etc.) - so, in schema INTRO, I created a table to receive the owner name, table name and column name for all columns of data type VARCHAR2 and which contain only "numeric" values or null (numeric defined the way you did.) NOTE HERE: If you then want to really check for all numeric values, you will indeed need a regular expression, or something like what Wernfried has shown; I would still, otherwise, use an EXISTS condition rather than a COUNT in the anonymous procedure.
Then I created an anonymous block to find the needed columns. NOTE: You will not have a schema INTRO - so change it everywhere in my code (both in creating the table and in the anonymous block). If the procedure completes successfully, you should be able to query the table. I show that at the end too.
While logged in as SYS (or another user with SYSDBA powers):
create table intro.cols_with_numbers (
owner_name varchar2(128),
table_name varchar2(128),
column_name varchar2(128)
);
declare x number;
begin
execute immediate 'truncate table intro.cols_with_numbers';
for t in ( select owner, table_name, column_name
from dba_tab_columns
where data_type like 'VARCHAR2%'
and owner = 'HR'
)
loop
execute immediate 'select case when exists (
select *
from ' || t.owner || '.' || t.table_name ||
' where translate(' || t.column_name || ',
''z0123456789'', ''z'') is not null
) then 1 end
from dual'
into x;
if x is null then
insert into intro.cols_with_numbers (owner_name, table_name, column_name)
values(t.owner, t.table_name, t.column_name);
end if;
end loop;
end;
/
Run this procedure and then query the table:
select * from intro.cols_with_numbers;
no rows selected
(which means there were no numeric columns in tables in the HR schema, in the wrong data type VARCHAR2 - or at least, no such columns that had only non-negative integer values.) You can test further, by intentionally creating a table with such a column and testing to see it is "caught" by the procedure.
ADDED - Here is what happens when I change the owner from 'HR' to 'SCOTT':
PL/SQL procedure successfully completed.
OWNER_NAME TABLE_NAME COLUMN_NAME
-------------------- -------------------- --------------------
SCOTT BONUS JOB
SCOTT BONUS ENAME
so it seems to work fine (although on other schemas I sometimes run into an error... I'll see if I can figure out what that is).
In this case the table is empty (no rows!) - this is one example of a "false positive" you may find. (More generally, you will get a false positive if everything in a VARCHAR2 column is null - in all rows of the table.)
NOTE also that a column may have only numeric values and still the best data type would be VARCHAR2. This is the case when the values are simply identifiers and are not meant as "numbers" (which we can compare to each other or to fixed values, and/or with which we can do arithmetic). Example - a SSN (Social Security Number) or the equivalent in other countries; the SSN is each person's "official" identifier for doing business with the government. The SSN is numeric (actually, perhaps to accentuate the fact it is NOT supposed to be a "number" despite the name, it is often written with a couple of dashes...)

Renaming multiple columns in PostgreSQL

My table has a bunch of columns in the following format:
_settingA
_settingB
_settingB
And I want to rename them simply to add a prefix as follows:
_1_settingA
_1_settingB
_1_settingC
I have a lot more than three columns to rename in this way. If I had just three, I'd just do it manually one by one.
What is the quickest / most efficient way to achieve this?
There's no single command aproach. Obviously you could type multiple comands for RENAME by your self, but let me intoduce some improvement:) As I said in this answer
...for all such bulk-admin-operations you could use PostgreSQL system tables to generate queries for you instead of writing them by hand
In your case it would be:
SELECT
'ALTER TABLE ' || tab_name || ' RENAME COLUMN '
|| quote_ident(column_name) || ' TO '
|| quote_ident( '_1' || column_name) || ';'
FROM (
SELECT
quote_ident(table_schema) || '.' || quote_ident(table_name) as tab_name,
column_name
FROM information_schema.columns
WHERE
table_schema = 'schema_name'
AND table_name = 'table_name'
AND column_name LIKE '\_%'
) sub;
That'll give you set of strings which are SQL commands like:
ALTER TABLE schema_name.table_name RENAME COLUMN "_settingA" TO "_1_settingA";
ALTER TABLE schema_name.table_name RENAME COLUMN "_settingB" TO "_1_settingB";
...
There no need using table_schema in WHERE clause if your table is in public schema. Also remember using function quote_ident() -- read my original answer for more explanation.
Edit:
I've change my query so now it works for all columns with name begining with underscore _. Because underscore is special character in SQL pattern matching, we must escape it (using \) to acctually find it.
Something simple like this will work.
SELECT FORMAT(
'ALTER TABLE %I.%I.%I RENAME %I TO %I;',
table_catalog,
table_schema,
table_name,
column_name,
'_PREFIX_' + column_name
)
FROM information_schema.columns
WHERE table_name = 'foo';
%I will do quote_ident, which is substantially nicer. If you're in PSQL you can run it with \gexec
You can use the following function :
(I use this to add prefix on tables wiches have more than 50 columns)
First create the function :
CREATE OR REPLACE FUNCTION rename_cols( schema_name_ text,table_name_ text, prefix varchar(4))
RETURNS bool AS
$BODY$
DECLARE
rec_selection record;
BEGIN
FOR rec_selection IN (
SELECT column_name FROM information_schema.columns WHERE table_schema = schema_name_ AND table_name = table_name_) LOOP
EXECUTE 'ALTER TABLE '||schema_name_||'.'||table_name_||' RENAME COLUMN "'|| rec_selection.column_name ||'" TO "'||prefix|| rec_selection.column_name ||'" ;';
END LOOP;
RETURN True;
END;
$BODY$
LANGUAGE plpgsql VOLATILE
COST 100;
Then execute function :
SELECT rename_cols('public','test','d');
Hope it will be usefull,
You can't do that.
All the actions except RENAME and SET SCHEMA can be combined into a list of multiple alterations to apply in parallel.
most efficient way is using ActiveRecord.

Check a whole table for a single value

Background: I'm converting a database table to a format that doesn't support null values. I want to replace the null values with an arbitrary number so my application can support null values.
Question: I'd like to search my whole table for a value ("999999", for example) to make sure that it doesn't appear in the table. I could write a script to test each column individually, but I wanted to know if there is a way I could do this in pure sql without enumerating each field. Is that possible?
You can use a special feature of the PostgreSQL type system:
SELECT *
FROM tbl t
WHERE t::text LIKE '%999999%';
There is a composite type of the same name for every table that you create in PostgreSQL. And there is a text representation for every type in PostgreSQL (to input / output values).
Therefore you can just cast the whole row to text and if the string '999999' is contained in any column (its text representation, to be precise) it is guaranteed to show in the query above.
You cannot rule out false positives completely, though, if separators and / or decorators used by Postgres for the row representation can be part of the search term. It's just very unlikely. And positively not the case for your search term '999999'.
There was a very similar question on codereview.SE recently. I added some more explanation in my answer there.
create or replace function test_values( real ) returns setof record as
$$
declare
query text;
output record;
begin
for query in select 'select distinct ''' || table_name || '''::text table_name, ''' || column_name || '''::text column_name from '|| quote_ident(table_name)||' where ' || quote_ident(column_name) || ' = ''' || $1::text ||'''::' || data_type from information_schema.columns where table_schema='public' and numeric_precision is not null
loop
raise notice '%1 qqqq', query;
execute query::text into output;
return next output;
end loop;
return;
end;$$ language plpgsql;
select distinct * from test_values( 999999 ) as t(table_name text ,column_name text)

How can I find columns which have non-null values?

I have many columns in oracle database and some new are added with values. I like to find out which columns have values other than 0 or null. So I am looking for column names for which some sort of useful values exists at least in one row.
How do I do this?
Update: This sounds very close. How do I modify this to suit my needs?
select column_name, nullable, num_distinct, num_nulls
from all_tab_columns
where table_name = 'SOME_TABLE'
You can query all the columns using the dba_tab_cols view and then see if there are columns which have values other than 0 or null.
create or replace function f_has_null_rows(
i_table_name in dba_tab_cols.table_name%type,
i_column_name in dba_tab_cols.table_name%type
) return number is
v_sql varchar2(200);
v_count number;
begin
v_sql := 'select count(*) from ' || i_table_name ||
' where ' || i_column_name ' || ' is not null and '
|| i_column_name || ' <>0 ';
execute immediate v_sql into v_count;
return v_count;
end;
/
select table_name, column_name from dba_tab_Cols
where f_has_null_rows (table_name, column_name) > 0
If you have synonyms in some schemas, you mighty find some of the tables are repeated. You'll have to change the code to cater to that.
Also, the check "is not equal to zero" might not be valid for columns that are not integers and will give errors if columns are of date datatype. You'll need to add the conditions for those cases. use the Data_type column in dba_tab_cols and add the condition as needed.
Select Column_name
from user_tab_columns
where table_name='EMP' and num_nulls=0;
This finds columns which does not have any values so you can perform any actions to that.
Sorry, I misread the question the first time.
From this post on Oracle's forums
Assuming your stats are up to date:
SELECT t.table_name,
t.column_name
FROM user_tab_columns t
WHERE t.nullable = 'Y'
AND t.num_distinct = 0;
Will return you a list of table names and columns that are null. You might want to add something like:
AND t.table_name = upper('Your_table_name')
in there to limit the results to just your table.
select 'cats' as mycolumname from T
where exists (Select id from T where cats is not null)
union
select 'dogs' as mycolumnname from T
where exists (select id from T where dogs is not null)
# ad nauseam
is how to do it in SQL. EDIT: Different flavors of SQL might let you optimize with LIMIT or TOP 'n' in the subquery. Or maybe they're even smart enough to realize that EXIST() only needs one row and optimize silently/transparently. P.S. Add your test for zero to the subquery.