Set empty strings ('') to NULL in the whole database - sql

In my database are many text columns where values are empty strings (''). The empty strings need to be set to NULL. I do not know the exact schemas, tables and columns in this database or rather I want to write a general solution which can be reused.
How would I write a query / function to find all text columns in all tables in all schemas and update all columns with empty strings ('') to NULL?

The most efficient way to achieve this:
Run a single UPDATE per table.
Only update nullable columns (not defined NOT NULL) with any actual empty string.
Only update rows with any actual empty string.
Leave other values unchanged.
This related answer has a plpgsql function that builds and runs the UPDATE command using system catalog pg_attribute automatically and safely for any given table:
Replace empty strings with null values
Using the function f_empty2null() from this answer, you can loop through selected tables like this:
DO
$do$
DECLARE
_tbl regclass;
BEGIN
FOR _tbl IN
SELECT c.oid::regclass
FROM pg_class c
JOIN pg_namespace n ON n.oid = c.relnamespace
WHERE c.relkind = 'r' -- only regular tables
AND n.nspname NOT LIKE 'pg_%' -- exclude system schemas
LOOP
RAISE NOTICE $$PERFORM f_empty2null('%');$$, _tbl;
-- PERFORM f_empty2null(_tbl); -- uncomment to prime the bomb
END LOOP;
END
$do$;
Careful! This updates all empty strings in all columns of all user tables in the DB. Be sure that's what you want or it might nuke your database.
You need UPDATE privileges on all selected tables, of course.
As a child safety device I commented the payload.
You may have noted that I use the system catalogs directly, not the information schema (which would work, too). About this:
How to check if a table exists in a given schema
Query to return output column names and data types of a query, table or view
For repeated use
Here is an integrated solution for repeated use. Without safety devices:
CREATE OR REPLACE FUNCTION f_all_empty2null(OUT _tables int, OUT _rows int) AS
$func$
DECLARE
_typ CONSTANT regtype[] := '{text, bpchar, varchar, \"char\"}';
_sql text;
_row_ct int;
BEGIN
_tables := 0; _rows := 0;
FOR _sql IN
SELECT format('UPDATE %s SET %s WHERE %s'
, t.tbl
, string_agg(format($$%1$s = NULLIF(%1$s, '')$$, t.col), ', ')
, string_agg(t.col || $$ = ''$$, ' OR '))
FROM (
SELECT c.oid::regclass AS tbl, quote_ident(attname) AS col
FROM pg_namespace n
JOIN pg_class c ON c.relnamespace = n.oid
JOIN pg_attribute a ON a.attrelid = c.oid
WHERE n.nspname NOT LIKE 'pg_%' -- exclude system schemas
AND c.relkind = 'r' -- only regular tables
AND a.attnum >= 1 -- exclude tableoid & friends
AND NOT a.attisdropped -- exclude dropped columns
AND NOT a.attnotnull -- exclude columns defined NOT NULL!
AND a.atttypid = ANY(_typ) -- only character types
ORDER BY a.attnum
) t
GROUP BY t.tbl
LOOP
EXECUTE _sql;
GET DIAGNOSTICS _row_ct = ROW_COUNT; -- report nr. of affected rows
_tables := _tables + 1;
_rows := _rows + _row_ct;
END LOOP;
END
$func$ LANGUAGE plpgsql;
Call:
SELECT * FROM pg_temp.f_all_empty2null();
Returns:
_tables | _rows
---------+---------
23 | 123456
Note how I escaped both table and columns names properly!
c.oid::regclass AS tbl, quote_ident(attname) AS col
Consider:
Table name as a PostgreSQL function parameter
Careful! Same warning as above.
Also consider the basic explanation in the answer I linked above:
Replace empty strings with null values

The simplest way to go about it is manually, table by table. For each table, do something like this:
START TRANSACTION;
UPDATE tablename SET
stringfield1 = NULLIF(stringfield1, ''),
stringfield2 = NULLIF(stringfield2, '');
<do some selects to make sure everything looks right>
COMMIT;
That will rewrite every row in the table, but it will only make one pass over the table. It might or might not be impractical for you.
You might instead want to do it field by field using a WHERE clause, to reduce the number of updates, like this.
START TRANSACTION;
UPDATE tablename SET stringfield1 = NULL WHERE stringfield1 = '';
<do some selects to make sure everything looks right>
COMMIT;
That will only rewrite the rows that need to be rewritten, but will require multiple passes over each table.

i think the code below is a generalised one. and you can use it any time anywhere:
Declare #Query varchar(1000)
declare #AllDatabaseTables table(id int,Table_Name varchar(50))
Insert into #AllDatabaseTables select Table_Name from information_schema.tables
declare #Table_Name varchar(50)
declare #i int=1
While #i<=(Select Count(*) from #AllDatabaseTables)
BEGIN
Select #Table_Name=Table_Name from #AllDatabaseTables Where id=#i
Declare #ColumnTable table(id int,ColumnName varchar(100))
Insert into #ColumnTable Select COLUMN_NAME from information_schema.columns Where Table_Name=#Table_Name and DATA_TYPE='varchar' --if the datatype is varchar type
Declare #ColumnName varchar(50)
Declare #k int=1
While #k<=(Select count(*) from #ColumnTable)
BEGIN
Select #ColumnName=ColumnName from #ColumnTable where id=#k
Set #Query='Update '+#Table_Name+' Set '+#ColumnName+'=NULL where '+#ColumnName+'='''' '
Exec(Query)
Set #k=#k+1
END
Set #i=#i+1
END

Related

Drop table if it exists with DB2/400 SQL

My goal is pretty straightforward - if table has rows, drop it.
Despite the fact that currently there are several similar answers none of them worked for me.
DB2 Drop table if exists equivalent
Suggested solution:
IF EXISTS (SELECT name FROM sysibm.systables WHERE name = 'mylib.mytable') THEN
DROP TABLE mylib.mytable;END IF;
Result:
SQL State: 42601 Vendor Code: -199 Message: [SQL0199] Keyword IF not expected.
Valid tokens: ( CL END GET SET CALL DROP FREE HOLD LOCK OPEN WITH ALTER BEGIN
Drop DB2 table if exists
Suggested solution:
--#SET TERMINATOR #
begin
declare statement varchar(128);
declare continue handle for sqlstate '42710' BEGIN END;
SET STATEMENT = 'DROP TABLE MYLIB.MYTABLE';
EXECUTE IMMEDIATE STATEMENT;
end #
Result:
Message: [SQL0104] Token HANDLE was not invalid. Valid tokens: HANDLER or, if replace handle with handler:
Message: [SQL0199] Keyword STATEMENT not expected. Valid tokens: SQL PATH RESULT SCHEMA CURRENT CONNECTION DESCRIPTOR.
From answer about views
Suggested solution:
DROP TABLE MY_TABLE ONLY IF EXISTS source.
Result:
Message: [SQL0104] Token ONLY was not invalid. Valid tokens: RESTRICT CASCADE
So, I wonder if an alternate solution exists. CL solution is also interesting.
I'm assuming you may want to do this more than once, so a procedure might be in order.
CREATE or replace PROCEDURE DROP_LIVE_TABLE
(in #table varchar(10)
,in #library varchar(10)
)
BEGIN
declare #stmt varchar(100);
declare #cnt int;
IF exists( select * from systables
where sys_dname = #library
and sys_tname = #table
and table_type in ('P','T')
) THEN
SELECT int(sum(number_rows))
INTO #cnt
from SYSTABLESTAT
where sys_dname = #library
and sys_tname = #table
;
IF #cnt > 0 THEN
set #stmt = 'DROP TABLE '||#library||'.'||#table||' CASCADE';
execute immediate #stmt;
END IF;
END IF;
RETURN;
END;
The CASCADE keyword causes any dependent objects such as indexes, logical files, views, or such to be deleted as well.
Here is a CL answer to this question:
PGM PARM(&FILENAME)
DCL VAR(&FILENAME) TYPE(*CHAR) LEN(10)
DCL VAR(&NUMRECS) TYPE(*DEC) LEN(10 0)
RTVMBRD FILE(&FILENAME) NBRCURRCD(&NUMRECS)
IF COND(&NUMRECS > 0) THEN(DLTF +
FILE(&FILENAME))
OUT: ENDPGM
This solution would have trouble if the physical file has dependencies such as indexes or logical files. Those dependencies would have to be deleted first.
The solution by #danny117 on the other hand does not work in all environments. For example I was unable to coerce it to work in SQuirreL client. But it does work in i Navigator. It also works in RUNSQLSTM, but I was unable to determine how to make it work with unqualified table references. If the tables are unqualified, RUNSQLSTM uses the default collection from DFTRDBCOL. The CURRENT_SCHEMA special register does not return the value from DFTRDBCOL.
Here is the if table has rows drop it solution using a compound statement:
begin
if( exists(
select 1 from qsys2.systables
where table_schema = 'MYLIB'
and table_name = 'MYTABLE'
)) then
if( exists(
select 1 from mylib.mytable
)) then
drop table mylib.mytable;
end if;
end if;
end;
I am guessing at the reason you would want to do this, but if it is to allow creation of a new table, then best way may be with a CREATE OR REPLACE TABLE if you are at IBM i v7.2 or greater.
If all you want to do is make sure you have an empty table, TRUNCATE (v7.2+) or DELETE may be better options.
Drop table if exists using atomic statement.
BEGIN ATOMIC
IF( EXISTS(
SELECT 1 FROM TABLES
WHERE TABLE_SCHEMA = 'MYLIB'
AND TABLE_NAME = 'MYTABLE'
)) THEN
DROP TABLE MYLIB/MYTABLE;
END IF;
END;
try this:
BEGIN
IF EXISTS (SELECT NAME FROM QSYS2.SYSTABLES WHERE TABLE_SCHEMA = 'YOURLIBINUPPER' AND TABLE_NAME = 'YOURTABLEINUPPER') THEN
DROP TABLE YOURLIB.YOURTABLE;
END IF;
END ;

Find tables, columns with specific value

I'm using Firebird 2.5.0. I know a value and need to find all tables, columns in which it occurs.
I created procedure:
CREATE OR ALTER PROCEDURE NEW_PROCEDURE (
searching_value varchar(30))
returns (
table_with_value varchar(100),
column_with_value varchar(100))
as
declare variable all_tables varchar(50);
declare variable all_columns varchar(50);
declare variable all_values varchar(50);
begin
FOR SELECT
r.rdb$relation_name, f.rdb$field_name
from rdb$relation_fields f
join rdb$relations r on f.rdb$relation_name = r.rdb$relation_name
and r.rdb$view_blr is null
and (r.rdb$system_flag is null or r.rdb$system_flag = 0)
order by 1, f.rdb$field_position INTO :all_tables, :all_columns
DO
BEGIN
FOR SELECT all_columns FROM all_tables
INTO :all_Values
DO
BEGIN
IF (SEARCHING_VALUE = all_Values) THEN
BEGIN
table_With_Value = all_Tables;
column_With_Value = all_Columns;
SUSPEND;
END
END
END
END^
When I run it I get error message:
Undefined name.
Dynamic SQL Error.
SQL error code = -204.
Table unknown.
ALL_TABLES.
At line 21, column 13.
So in this select statement "SELECT all_columns FROM all_tables" it is not taking values from previous for select statement but just trying to find table all_tables. How to fix it?
The problem is that all_columns is considered to be a colum name and all_tables a table name and not your variables in:
SELECT all_columns FROM all_tables
You can't parametrize objectnames in a query like this. Also note that if it had been possible to parametrize object names, you would have had to use :all_columns and :all_tables for disambiguation.
Instead you will need to create a dynamic SQL statement and execute that with EXECUTE STATEMENT (or more specifically: FOR EXECUTE STATEMENT).
In this case:
FOR EXECUTE STATEMENT 'SELECT "' || all_columns || '" FROM "' || all_tables || '"'
INTO :all_values
DO
BEGIN
/* .... */
END
I have quoted the object names to account for case sensitive column and table names (or identifiers that are invalid unquoted). Constructing a query like this might leave you open to SQL injection if the values are obtained from another source than the Firebird metadata tables.

nzsql - Converting a subquery into columns for another select

Goal: Use a given subquery's results (a single column with many rows of names) to act as the outer select's selection field.
Currently, my subquery is the following:
SELECT column_name
FROM information_schema.columns
WHERE table_name = 'test_table' AND column_name not in ('colRemove');
What I am doing in this subquery is grabbing all the column names from a table (i.e. test_table) and outputting all except for the column name specified (i.e. colRemove). As stated in the "goal", I want to use this subquery as such:
SELECT (*enter subquery from above here*)
FROM actual_table
WHERE (*enter specific conditions*)
I am working on a Netezza SQL server that is version 7.0.4.4. Ideally, I would like to make the entire query executable in one line, but for now, a working solution would be much appreciated. Thanks!
Note: I do not believe that the SQL extensions has been installed (i.e. arrays), but I will need to double check this.
A year too late, here's the best I can come up with but, as you already noticed, it requires a stored procedure to do the dynamic SQL. The stored proc creates a view with the all the columns from the source table minus the one you want to exclude.
-- Create test data.
CREATE TABLE test (firstcol INTEGER, secondcol INTEGER, thirdcol INTEGER);
INSERT INTO test (firstcol, secondcol, thirdcol) VALUES (1, 2, 3);
INSERT INTO test (firstcol, secondcol, thirdcol) VALUES (4, 5, 6);
-- Install stored procedure.
CREATE OR REPLACE PROCEDURE CreateLimitedView (varchar(ANY), varchar(ANY)) RETURNS BOOLEAN
LANGUAGE NZPLSQL AS
BEGIN_PROC
DECLARE
tableName ALIAS FOR $1;
columnToExclude ALIAS FOR $2;
colRec RECORD;
cols VARCHAR(2000); -- Adjust as needed.
isfirstcol BOOLEAN;
BEGIN
isfirstcol := true;
FOR colRec IN EXECUTE
'SELECT ATTNAME AS NAME FROM _V_RELATION_COLUMN
WHERE
NAME=UPPER('||quote_literal(tableName)||')
AND ATTNAME <> UPPER('||quote_literal(columnToExclude)||')
ORDER BY ATTNUM'
LOOP
IF isfirstcol THEN
cols := colRec.NAME;
ELSE
cols := cols || ', ' || colRec.NAME;
END IF;
isfirstcol := false;
END LOOP;
-- Should really check if 'LimitedView' already exists as a view, table or synonym.
EXECUTE IMMEDIATE 'CREATE OR REPLACE VIEW LimitedView AS SELECT ' || cols || ' FROM ' || quote_ident(tableName);
RETURN true;
END;
END_PROC
;
-- Run the stored proc to create the view.
CALL CreateLimitedView('test', 'secondcol');
-- Select results from the view.
SELECT * FROM limitedView WHERE firstcol = 4;
FIRSTCOL | THIRDCOL
----------+----------
4 | 6
You could have the stored proc return a resultset directly but then you wouldn't be able to filter results with a WHERE clause.

Update multiple columns that start with a specific string

I am trying to update a bunch of columns in a DB for testing purposes of a feature. I have a table that is built with hibernate so all of the columns that are created for an embedded entity begin with the same name. I.e. contact_info_address_street1, contact_info_address_street2, etc.
I am trying to figure out if there is a way to do something to the affect of:
UPDATE table SET contact_info_address_* = null;
If not, I know I can do it the long way, just looking for a way to help myself out in the future if I need to do this all over again for a different set of columns.
You need dynamic SQL for this. So you must defend against possible SQL injection.
Basic query
The basic query to generate the DML command needed can look like this:
SELECT format('UPDATE tbl SET (%s) = (%s)'
,string_agg (quote_ident(attname), ', ')
,string_agg ('NULL', ', ')
)
FROM pg_attribute
WHERE attrelid = 'tbl'::regclass
AND NOT attisdropped
AND attnum > 0
AND attname ~~ 'foo_%';
Returns:
UPDATE tbl SET (foo_a, foo_b, foo_c) = (NULL, NULL, NULL);
Make use of the "column-list syntax" of UPDATE to shorten the code and simplify the task.
I query the system catalogs instead of information schema because the latter, while being standardized and guaranteed to be portable across major versions, is also notoriously slow and sometimes unwieldy. There are pros and cons, see:
Get column names and data types of a query, table or view
quote_ident() for the column names prevents SQL-injection - also necessary for identifiers.
string_agg() requires 9.0+.
Full automation with PL/pgSQL function
CREATE OR REPLACE FUNCTION f_update_cols(_tbl regclass, _col_pattern text
, OUT row_ct int, OUT col_ct bigint)
LANGUAGE plpgsql AS
$func$
DECLARE
_sql text;
BEGIN
SELECT INTO _sql, col_ct
format('UPDATE tbl SET (%s) = (%s)'
, string_agg (quote_ident(attname), ', ')
, string_agg ('NULL', ', ')
)
, count(*)
FROM pg_attribute
WHERE attrelid = _tbl
AND NOT attisdropped -- no dropped columns
AND attnum > 0 -- no system columns
AND attname LIKE _col_pattern; -- only columns matching pattern
-- RAISE NOTICE '%', _sql; -- output SQL for debugging
EXECUTE _sql;
GET DIAGNOSTICS row_ct = ROW_COUNT;
END
$func$;
COMMENT ON FUNCTION f_update_cols(regclass, text)
IS 'Updates all columns of table _tbl ($1)
that match _col_pattern ($2) in a LIKE expression.
Returns the count of columns (col_ct) and rows (row_ct) affected.';
Call:
SELECT * FROM f_update_cols('myschema.tbl', 'foo%');
To make the function more practical, it returns information as described in the comment. More about obtaining the result status in plpgsql in the manual.
I use the variable _sql to hold the query string, so I can collect the number of columns found (col_ct) in the same query.
The object identifier type regclass is the most efficient way to automatically avoid SQL injection (and sanitize non-standard names) for the table name, too. You can use schema-qualified table names to avoid ambiguities. I would advise to do so if you (can) have multiple schemas in your db! See:
Table name as a PostgreSQL function parameter
db<>fiddle here
Old sqlfiddle
There's no handy shortcut sorry. If you have to do this kind of thing a lot, you could create a function to dynamically execute sql and achieve your goal.
CREATE OR REPLACE FUNCTION reset_cols() RETURNS boolean AS $$ BEGIN
EXECUTE (select 'UPDATE table SET '
|| array_to_string(array(
select column_name::text
from information_schema.columns
where table_name = 'table'
and column_name::text like 'contact_info_address_%'
),' = NULL,')
|| ' = NULL');
RETURN true;
END; $$ LANGUAGE plpgsql;
-- run the function
SELECT reset_cols();
It's not very nice though. A better function would be one that accepts the tablename and column prefix as args. Which I'll leave as an exercise for the readers :)

How to choose tables on select from all_tables?

I have the following table name template, there are a couple with the same name and a number at the end: fmj.backup_semaforo_geo_THENUMBER, for example:
select * from fmj.backup_semaforo_geo_06391442
select * from fmj.backup_semaforo_geo_06398164
...
Lets say I need to select a column from every table which succeeds with the 'fmj.backup_semaforo_geo_%' filter, I tried this:
SELECT calle --This column is from the backup_semaforo_geo_# tables
FROM (SELECT table_name
FROM all_tables
WHERE owner = 'FMJ' AND table_name LIKE 'BACKUP_SEMAFORO_GEO_%');
But I'm getting the all_tables tables name data:
TABLE_NAME
----------
BACKUP_SEMAFORO_GEO_06391442
BACKUP_SEMAFORO_GEO_06398164
...
How can I achieve that without getting the all_tables output?
Thanks.
Presumably your current query is getting ORA-00904: "CALLE": invalid identifier, because the subquery doesn't have a column called CALLE. You can't provide a table name to a query at runtime like that, unfortunately, and have to resort to dynamic SQL.
Something like this will loop through all the tables and for each one will get all the values of CALLE from each one, which you can then loop through. I've used DBMS_OUTPUT to display them, assuming you're doing this in SQL*Plus or something that can deal with that; but you may want to do something else with them.
set serveroutput on
declare
-- declare a local collection type we can use for bulk collect; use any table
-- that has the column, or if there isn't a stable one use the actual data
-- type, varchar2(30) or whatever is appropriate
type t_values is table of table.calle%type;
-- declare an instance of that type
l_values t_values;
-- declare a cursor to generate the dynamic SQL; where this is done is a
-- matter of taste (can use 'open x for select ...', then fetch, etc.)
-- If you run the query on its own you'll see the individual selects from
-- all the tables
cursor c1 is
select table_name,
'select calle from ' || owner ||'.'|| table_name as query
from all_tables
where owner = 'FMJ'
and table_name like 'BACKUP_SEMAFORO_GEO%'
order by table_name;
begin
-- loop around all the dynamic queries from the cursor
for r1 in c1 loop
-- for each one, execute it as dynamic SQL, with a bulk collect into
-- the collection type created above
execute immediate r1.query bulk collect into l_values;
-- loop around all the elements in the collection, and print each one
for i in 1..l_values.count loop
dbms_output.put_line(r1.table_name ||': ' || l_values(i));
end loop;
end loop;
end;
/
May be a dynamic SQL in a PLSQL program;
for a in (SELECT table_name
FROM all_tables
WHERE owner = 'FMJ' AND table_name LIKE 'BACKUP_SEMAFORO_GEO_%')
LOOP
sql_stmt := ' SELECT calle FROM' || a.table_name;
EXECUTE IMMEDIATE sql_stmt;
...
...
END LOOP;