Background: I'm converting a database table to a format that doesn't support null values. I want to replace the null values with an arbitrary number so my application can support null values.
Question: I'd like to search my whole table for a value ("999999", for example) to make sure that it doesn't appear in the table. I could write a script to test each column individually, but I wanted to know if there is a way I could do this in pure sql without enumerating each field. Is that possible?
You can use a special feature of the PostgreSQL type system:
SELECT *
FROM tbl t
WHERE t::text LIKE '%999999%';
There is a composite type of the same name for every table that you create in PostgreSQL. And there is a text representation for every type in PostgreSQL (to input / output values).
Therefore you can just cast the whole row to text and if the string '999999' is contained in any column (its text representation, to be precise) it is guaranteed to show in the query above.
You cannot rule out false positives completely, though, if separators and / or decorators used by Postgres for the row representation can be part of the search term. It's just very unlikely. And positively not the case for your search term '999999'.
There was a very similar question on codereview.SE recently. I added some more explanation in my answer there.
create or replace function test_values( real ) returns setof record as
$$
declare
query text;
output record;
begin
for query in select 'select distinct ''' || table_name || '''::text table_name, ''' || column_name || '''::text column_name from '|| quote_ident(table_name)||' where ' || quote_ident(column_name) || ' = ''' || $1::text ||'''::' || data_type from information_schema.columns where table_schema='public' and numeric_precision is not null
loop
raise notice '%1 qqqq', query;
execute query::text into output;
return next output;
end loop;
return;
end;$$ language plpgsql;
select distinct * from test_values( 999999 ) as t(table_name text ,column_name text)
Related
I like to find all columns in my Oracle database schema that only contain numeric data but having a non-numeric type. (So basically column-candidates with probably wrong chosen data types.)
I have a query for all varchar2-columns:
SELECT TABLE_NAME, COLUMN_NAME, DATA_TYPE
FROM user_tab_cols
WHERE DATA_TYPE = 'VARCHAR2';
Furthermore I have a query to check for any non-numeric data inside a table myTable and a column myColumn:
SELECT 1
FROM myTable
WHERE NOT REGEXP_LIKE(myColumn, '^[[:digit:]]+$');
I like to combine both queries in that way that the first query only returns the rows where not exists the second.
The main problem here is that the first query is on meta layer of the data dictionary where TABLE_NAME and COLUMN_NAME comes as data and I need that data as identifiers (and not as data) in the second query.
In pseudo-SQL I have something like that in mind:
SELECT TABLE_NAME, COLUMN_NAME, DATA_TYPE
FROM user_tab_cols
WHERE DATA_TYPE = 'VARCHAR2'
AND NOT EXISTS
(SELECT 1 from asIdentifier(TABLE_NAME)
WHERE NOT REGEXP_LIKE(asIdentifier(COLUMN_NAME), '^[[:digit:]]+$'));
Create a function as this:
create or replace function isNumeric(val in VARCHAR2) return INTEGER AS
res NUMBER;
begin
res := TO_NUMBER(val);
RETURN 1;
EXCEPTION
WHEN OTHERS THEN
RETURN 0;
END;
Then you can use it like this:
DECLARE
r integer;
BEGIN
For aCol in (SELECT TABLE_NAME, COLUMN_NAME FROM user_tab_cols WHERE DATA_TYPE = 'VARCHAR2') LOOP
-- What about CHAR and CLOB data types?
execute immediate 'select count(*) from '||aCol.TABLE_NAME||' WHERE isNumeric('||aCol.COLUMN_NAME||') = 0' into r;
if r = 0 then
DBMS_OUTPUT.put_line(aCol.TABLE_NAME ||' '||aCol.COLUMN_NAME ||' contains numeric values only');
end if;
end loop;
end;
Note, the performance of this PL/SQL block will be poor. Hopefully this is a one-time-job only.
There are two possible approaches: dynamic SQL (DSQL) and XML.
First one was already demonstrated in another reply and it's faster.
XML approach just for fun
create or replace function to_number_udf(p in varchar2) return number
deterministic is
pragma udf;
begin
return p * 0;
exception when invalid_number or value_error then return 1;
end to_number_udf;
/
create table t_chk(str1, str2) as
select '1', '2' from dual union all
select '0001.1000', 'helloworld' from dual;
SQL> column owner format a20
SQL> column table_name format a20
SQL> column column_name format a20
SQL> with tabs_to_check as
2 (
3 select 'collection("oradb:/'||owner||'/'||table_name||'")/ROW/'||column_name||'/text()' x,
4 atc.*
5 from all_tab_columns atc
6 where table_name = 'T_CHK'
7 and data_type = 'VARCHAR2'
8 and owner = user
9 )
10 select --+ no_query_transformation
11 owner, table_name, column_name
12 from tabs_to_check ttc, xmltable(x columns "." varchar2(4000)) x
13 group by owner, table_name, column_name
14 having max(to_number_udf(".")) = 0;
OWNER TABLE_NAME COLUMN_NAME
-------------------- -------------------- --------------------
TEST T_CHK STR1
PS. On Oracle 12.2 you can use to_number(... default ... on conversion error) instead of UDF.
The faster way to check if a string is all digits vs. contains at least one non-digit character is to use the translate function. Alas, due to the non-SQL Standard way Oracle handles empty strings, the form of the function we must use is a little complicated:
translate(input_string, 'z0123456789', 'z')
(z can be any non-digit character; we need it so that the third argument is not null). This works by translating z to itself and 0, etc. to nothing. So if the input string was null or all-digits, and ONLY in that case, the value returned by the function is null.
In addition: to make the process faster, you can test each column with an EXISTS condition. If a column is not meant to be numeric, then in most cases the EXISTS condition will become true very quickly, so you will have to inspect a very small number of values from such columns.
As I tried to make this work, I ran into numerous side issues. Presumably you want to look in all schemas (except SYS and perhaps SYSTEM). So you need to run the procedure (anonymous block) from an account with SYSDBA privileges. Then - I ran into issues with non-standard table and column names (names starting with an underscore and such); which brought to mind identifiers defined in double-quotes - a terrible practice.
For illustration, I will use the HR schema - on which the approach worked. You may need to tweak this further; I wasn't able to make it work by changing the line
and owner = 'HR'
to
and owner != 'SYS'
So - with this long intro - here is what I did.
First, in a "normal" user account (my own, named INTRO - I run a very small database, with only one "normal" user, plus the Oracle "standard" users like SCOTT, HR etc.) - so, in schema INTRO, I created a table to receive the owner name, table name and column name for all columns of data type VARCHAR2 and which contain only "numeric" values or null (numeric defined the way you did.) NOTE HERE: If you then want to really check for all numeric values, you will indeed need a regular expression, or something like what Wernfried has shown; I would still, otherwise, use an EXISTS condition rather than a COUNT in the anonymous procedure.
Then I created an anonymous block to find the needed columns. NOTE: You will not have a schema INTRO - so change it everywhere in my code (both in creating the table and in the anonymous block). If the procedure completes successfully, you should be able to query the table. I show that at the end too.
While logged in as SYS (or another user with SYSDBA powers):
create table intro.cols_with_numbers (
owner_name varchar2(128),
table_name varchar2(128),
column_name varchar2(128)
);
declare x number;
begin
execute immediate 'truncate table intro.cols_with_numbers';
for t in ( select owner, table_name, column_name
from dba_tab_columns
where data_type like 'VARCHAR2%'
and owner = 'HR'
)
loop
execute immediate 'select case when exists (
select *
from ' || t.owner || '.' || t.table_name ||
' where translate(' || t.column_name || ',
''z0123456789'', ''z'') is not null
) then 1 end
from dual'
into x;
if x is null then
insert into intro.cols_with_numbers (owner_name, table_name, column_name)
values(t.owner, t.table_name, t.column_name);
end if;
end loop;
end;
/
Run this procedure and then query the table:
select * from intro.cols_with_numbers;
no rows selected
(which means there were no numeric columns in tables in the HR schema, in the wrong data type VARCHAR2 - or at least, no such columns that had only non-negative integer values.) You can test further, by intentionally creating a table with such a column and testing to see it is "caught" by the procedure.
ADDED - Here is what happens when I change the owner from 'HR' to 'SCOTT':
PL/SQL procedure successfully completed.
OWNER_NAME TABLE_NAME COLUMN_NAME
-------------------- -------------------- --------------------
SCOTT BONUS JOB
SCOTT BONUS ENAME
so it seems to work fine (although on other schemas I sometimes run into an error... I'll see if I can figure out what that is).
In this case the table is empty (no rows!) - this is one example of a "false positive" you may find. (More generally, you will get a false positive if everything in a VARCHAR2 column is null - in all rows of the table.)
NOTE also that a column may have only numeric values and still the best data type would be VARCHAR2. This is the case when the values are simply identifiers and are not meant as "numbers" (which we can compare to each other or to fixed values, and/or with which we can do arithmetic). Example - a SSN (Social Security Number) or the equivalent in other countries; the SSN is each person's "official" identifier for doing business with the government. The SSN is numeric (actually, perhaps to accentuate the fact it is NOT supposed to be a "number" despite the name, it is often written with a couple of dashes...)
It seems like several months ago I came across a SO question covering this but I can't seem to find it now.
Basically, I want to do two things.
First, a number of tables were made with several columns numeric(20,2) and I want to just change them all to numeric. The statement is simple enough for one column:
ALTER TABLE table_name
ALTER COLUMN code
TYPE numeric;
Takes care of that.
Second, on these columns I want to remove any trailing zeros:
UPDATE table_name
SET code = replace(replace(code::text, '.50', '.5'), '.00', '')::numeric;
Having difficulty figuring out how to automate it so I only have to specify the table and it will clean up the table. Pretty sure this is possible.
You can find all of the columns with the data type that you want to change with a statement like:
select column_name, table_name
from information_schema.columns
where data_type='numeric'
and numeric_precision = 20
and numeric_scale = 2;
You can iterate over the result with a custom function or with a DO command such as:
do $$
declare
t record;
begin
for t IN select column_name, table_name
from information_schema.columns
where data_type='numeric'
and numeric_precision = 20
and numeric_scale = 2;
loop
execute 'alter table ' || t.table_name || ' alter column ' || t.column_name || ' type numeric';
end loop;
end$$;
Also, to remove trailing zeroes, a more general solution is to cast the value to float or double precision and then back to numeric, e.g:
set code = cast(cast(code as double precision) as numeric);
In my database, I have the standard app tables and backup tables. Eg. for a table "employee", I have a table called "bak_employee". The bak_employee table is a backup of the employee table. I use it to restore the employee table between tests.
I'd figure I can use these "bak_" tables to see the changes that have occurred during the test like this:
SELECT * FROM employee EXCEPT SELECT * FROM bak_employee
This will show me the inserted and updated records. I'll ignore the deleted records for now.
Now, what I would like to do is go through all the tables in my database to see if there's any changes in any of the tables. I was thinking of doing this as a function so it's easy to call over and over. This is what I have so far:
CREATE OR REPLACE FUNCTION public.show_diff()
RETURNS SETOF diff_tables AS
$BODY$
DECLARE
app_tables text;
BEGIN
FOR app_tables IN
SELECT table_name
FROM information_schema.tables
WHERE table_catalog = 'myDatabase'
AND table_schema = 'public'
AND table_name not like 'bak_%' -- exclude existing backup tables
LOOP
-- somehow loop through tables to see what's changed something like:
EXECUTE 'SELECT * FROM ' || app_tables || ' EXCEPT SELECT * FROM bak_' || app_tables;
END LOOP;
RETURN;
END;
$BODY$
LANGUAGE plpgsql;
But obviously this isn't going to return me any useful information. Any help would be appreciated.
You cannot return various well-known row types from the same function in the same call. A cheap fix is to cast each row type to text, so we have a common return type.
CREATE OR REPLACE FUNCTION public.show_diff()
RETURNS SETOF text AS -- text!!
$func$
DECLARE
app_table text;
BEGIN
FOR app_table IN
SELECT table_name
FROM information_schema.tables
WHERE table_catalog = 'myDatabase'
AND table_schema = 'public'
AND table_name NOT LIKE 'bak_%' -- exclude existing backup tables
LOOP
RETURN NEXT ' ';
RETURN NEXT '=== ' || app_table || ' ===';
RETURN QUERY EXECUTE format(
'SELECT x::text FROM (TABLE %I EXCEPT ALL TABLE %I) x'
, app_table, 'bak_' || app_table);
END LOOP;
RETURN;
END
$func$ LANGUAGE plpgsql;
Call:
SELECT * FROM public.show_diff();
I had the test suggested by #a_horse at first, but after your comment I realized that there is no need for this. EXCEPT considers NULL values to be equal and shows all differences.
While being at it, I improved and simplified your solution some more. Use EXCEPT ALL: cheaper and does not run the risk of folding complete duplicates.
Using EXCEPT clause in PostgreSQL
TABLE is just syntactical sugar.
Is there a shortcut for SELECT * FROM in psql?
However, if you have an index on a unique (combination of) column(s), a JOIN like I suggested before should be faster: finding the only possible duplicate via index should be substantially cheaper.
Crucial element is the cast the row type to text (x::text).
You can even make the function work for any table - but never more than one at a time: With a polymorphic parameter type:
Refactor a PL/pgSQL function to return the output of various SELECT queries
I am trying to update a bunch of columns in a DB for testing purposes of a feature. I have a table that is built with hibernate so all of the columns that are created for an embedded entity begin with the same name. I.e. contact_info_address_street1, contact_info_address_street2, etc.
I am trying to figure out if there is a way to do something to the affect of:
UPDATE table SET contact_info_address_* = null;
If not, I know I can do it the long way, just looking for a way to help myself out in the future if I need to do this all over again for a different set of columns.
You need dynamic SQL for this. So you must defend against possible SQL injection.
Basic query
The basic query to generate the DML command needed can look like this:
SELECT format('UPDATE tbl SET (%s) = (%s)'
,string_agg (quote_ident(attname), ', ')
,string_agg ('NULL', ', ')
)
FROM pg_attribute
WHERE attrelid = 'tbl'::regclass
AND NOT attisdropped
AND attnum > 0
AND attname ~~ 'foo_%';
Returns:
UPDATE tbl SET (foo_a, foo_b, foo_c) = (NULL, NULL, NULL);
Make use of the "column-list syntax" of UPDATE to shorten the code and simplify the task.
I query the system catalogs instead of information schema because the latter, while being standardized and guaranteed to be portable across major versions, is also notoriously slow and sometimes unwieldy. There are pros and cons, see:
Get column names and data types of a query, table or view
quote_ident() for the column names prevents SQL-injection - also necessary for identifiers.
string_agg() requires 9.0+.
Full automation with PL/pgSQL function
CREATE OR REPLACE FUNCTION f_update_cols(_tbl regclass, _col_pattern text
, OUT row_ct int, OUT col_ct bigint)
LANGUAGE plpgsql AS
$func$
DECLARE
_sql text;
BEGIN
SELECT INTO _sql, col_ct
format('UPDATE tbl SET (%s) = (%s)'
, string_agg (quote_ident(attname), ', ')
, string_agg ('NULL', ', ')
)
, count(*)
FROM pg_attribute
WHERE attrelid = _tbl
AND NOT attisdropped -- no dropped columns
AND attnum > 0 -- no system columns
AND attname LIKE _col_pattern; -- only columns matching pattern
-- RAISE NOTICE '%', _sql; -- output SQL for debugging
EXECUTE _sql;
GET DIAGNOSTICS row_ct = ROW_COUNT;
END
$func$;
COMMENT ON FUNCTION f_update_cols(regclass, text)
IS 'Updates all columns of table _tbl ($1)
that match _col_pattern ($2) in a LIKE expression.
Returns the count of columns (col_ct) and rows (row_ct) affected.';
Call:
SELECT * FROM f_update_cols('myschema.tbl', 'foo%');
To make the function more practical, it returns information as described in the comment. More about obtaining the result status in plpgsql in the manual.
I use the variable _sql to hold the query string, so I can collect the number of columns found (col_ct) in the same query.
The object identifier type regclass is the most efficient way to automatically avoid SQL injection (and sanitize non-standard names) for the table name, too. You can use schema-qualified table names to avoid ambiguities. I would advise to do so if you (can) have multiple schemas in your db! See:
Table name as a PostgreSQL function parameter
db<>fiddle here
Old sqlfiddle
There's no handy shortcut sorry. If you have to do this kind of thing a lot, you could create a function to dynamically execute sql and achieve your goal.
CREATE OR REPLACE FUNCTION reset_cols() RETURNS boolean AS $$ BEGIN
EXECUTE (select 'UPDATE table SET '
|| array_to_string(array(
select column_name::text
from information_schema.columns
where table_name = 'table'
and column_name::text like 'contact_info_address_%'
),' = NULL,')
|| ' = NULL');
RETURN true;
END; $$ LANGUAGE plpgsql;
-- run the function
SELECT reset_cols();
It's not very nice though. A better function would be one that accepts the tablename and column prefix as args. Which I'll leave as an exercise for the readers :)
Is it possible to do the following in Postgres:
SELECT column_name FROM information_schema WHERE table_name = 'somereport' AND data_type = 'integer';
SELECT SUM(coulmn_name[0]),SUM(coulmn_name[1]) ,SUM(coulmn_name[3]) FROM somereport;
In other words I need to select a group of columns from a table depending on certain criteria, and then sum each of those columns in the table.
I know I can do this in a loop, so I can count each column independently, but obviously that requires a query for each column returned from the information schema query. Eg:
FOR r IN select column_name from information_schema where report_view_name = 'somereport' and data_type = 'integer';
LOOP
SELECT SUM(r.column_name) FROM somereport;
END
This query creates the complete DML statement you are after:
WITH x AS (
SELECT 'public'::text AS _schema -- provide schema name ..
,'somereport'::text AS _tbl -- .. and table name once
)
SELECT 'SELECT ' || string_agg('sum(' || quote_ident(column_name)
|| ') AS sum_' || quote_ident(column_name), ', ')
|| E'\nFROM ' || quote_ident(x._schema) || '.' || quote_ident(x._tbl)
FROM x, information_schema.columns
WHERE table_schema = _schema
AND table_name = _tbl
AND data_type = 'integer'
GROUP BY x._schema, x._tbl;
You can execute it separately or wrap this query in a plpgsql function and run the query automatically with EXECUTE:
Full automation
Tested with PostgreSQL 9.1.4
CREATE OR REPLACE FUNCTION f_get_sums(_schema text, _tbl text)
RETURNS TABLE(names text[], sums bigint[]) AS
$BODY$
BEGIN
RETURN QUERY EXECUTE (
SELECT 'SELECT ''{'
|| string_agg(quote_ident(c.column_name), ', ' ORDER BY c.column_name)
|| '}''::text[],
ARRAY['
|| string_agg('sum(' || quote_ident(c.column_name) || ')'
, ', ' ORDER BY c.column_name)
|| ']
FROM '
|| quote_ident(_schema) || '.' || quote_ident(_tbl)
FROM information_schema.columns c
WHERE table_schema = _schema
AND table_name = _tbl
AND data_type = 'integer'
);
END;
$BODY$
LANGUAGE plpgsql;
Call:
SELECT unnest(names) AS name, unnest (sums) AS col_sum
FROM f_get_sums('public', 'somereport');
Returns:
name | col_sum
---------------+---------
int_col1 | 6614
other_int_col | 8364
third_int_col | 2720642
Explain
The difficulty is to define the RETURN type for the function, while number and names of columns returned will vary. One detail that helps a little: you only want integer columns.
I solved this by forming an array of bigint (sum(int_col) returns bigint). In addition I return an array of column names. Both sorted alphabetically by column name.
In the function call I split up these arrays with unnest() arriving at the handsome format displayed.
The dynamically created and executed query is advanced stuff. Don't get confused by multiple layers of quotes. Basically you have EXECUTE that takes a text argument containing the SQL query to execute. This text, in turn, is provided by secondary SQL query that builds the query string of the primary query.
If this is too much at once or plpgsql is rather new for you, start with this related answer where I explain the basics dealing with a much simpler function and provide links to the manual for the major features.
If performance is essential query the Postgres catalog directly (pg_catalog.pg_attributes) instead of using the standardized (but slow) information_schema.columns. Here is a simple example with pg_attributes.