How to find non-numeric columns containing only numeric data? - sql

I like to find all columns in my Oracle database schema that only contain numeric data but having a non-numeric type. (So basically column-candidates with probably wrong chosen data types.)
I have a query for all varchar2-columns:
SELECT TABLE_NAME, COLUMN_NAME, DATA_TYPE
FROM user_tab_cols
WHERE DATA_TYPE = 'VARCHAR2';
Furthermore I have a query to check for any non-numeric data inside a table myTable and a column myColumn:
SELECT 1
FROM myTable
WHERE NOT REGEXP_LIKE(myColumn, '^[[:digit:]]+$');
I like to combine both queries in that way that the first query only returns the rows where not exists the second.
The main problem here is that the first query is on meta layer of the data dictionary where TABLE_NAME and COLUMN_NAME comes as data and I need that data as identifiers (and not as data) in the second query.
In pseudo-SQL I have something like that in mind:
SELECT TABLE_NAME, COLUMN_NAME, DATA_TYPE
FROM user_tab_cols
WHERE DATA_TYPE = 'VARCHAR2'
AND NOT EXISTS
(SELECT 1 from asIdentifier(TABLE_NAME)
WHERE NOT REGEXP_LIKE(asIdentifier(COLUMN_NAME), '^[[:digit:]]+$'));

Create a function as this:
create or replace function isNumeric(val in VARCHAR2) return INTEGER AS
res NUMBER;
begin
res := TO_NUMBER(val);
RETURN 1;
EXCEPTION
WHEN OTHERS THEN
RETURN 0;
END;
Then you can use it like this:
DECLARE
r integer;
BEGIN
For aCol in (SELECT TABLE_NAME, COLUMN_NAME FROM user_tab_cols WHERE DATA_TYPE = 'VARCHAR2') LOOP
-- What about CHAR and CLOB data types?
execute immediate 'select count(*) from '||aCol.TABLE_NAME||' WHERE isNumeric('||aCol.COLUMN_NAME||') = 0' into r;
if r = 0 then
DBMS_OUTPUT.put_line(aCol.TABLE_NAME ||' '||aCol.COLUMN_NAME ||' contains numeric values only');
end if;
end loop;
end;
Note, the performance of this PL/SQL block will be poor. Hopefully this is a one-time-job only.

There are two possible approaches: dynamic SQL (DSQL) and XML.
First one was already demonstrated in another reply and it's faster.
XML approach just for fun
create or replace function to_number_udf(p in varchar2) return number
deterministic is
pragma udf;
begin
return p * 0;
exception when invalid_number or value_error then return 1;
end to_number_udf;
/
create table t_chk(str1, str2) as
select '1', '2' from dual union all
select '0001.1000', 'helloworld' from dual;
SQL> column owner format a20
SQL> column table_name format a20
SQL> column column_name format a20
SQL> with tabs_to_check as
2 (
3 select 'collection("oradb:/'||owner||'/'||table_name||'")/ROW/'||column_name||'/text()' x,
4 atc.*
5 from all_tab_columns atc
6 where table_name = 'T_CHK'
7 and data_type = 'VARCHAR2'
8 and owner = user
9 )
10 select --+ no_query_transformation
11 owner, table_name, column_name
12 from tabs_to_check ttc, xmltable(x columns "." varchar2(4000)) x
13 group by owner, table_name, column_name
14 having max(to_number_udf(".")) = 0;
OWNER TABLE_NAME COLUMN_NAME
-------------------- -------------------- --------------------
TEST T_CHK STR1
PS. On Oracle 12.2 you can use to_number(... default ... on conversion error) instead of UDF.

The faster way to check if a string is all digits vs. contains at least one non-digit character is to use the translate function. Alas, due to the non-SQL Standard way Oracle handles empty strings, the form of the function we must use is a little complicated:
translate(input_string, 'z0123456789', 'z')
(z can be any non-digit character; we need it so that the third argument is not null). This works by translating z to itself and 0, etc. to nothing. So if the input string was null or all-digits, and ONLY in that case, the value returned by the function is null.
In addition: to make the process faster, you can test each column with an EXISTS condition. If a column is not meant to be numeric, then in most cases the EXISTS condition will become true very quickly, so you will have to inspect a very small number of values from such columns.
As I tried to make this work, I ran into numerous side issues. Presumably you want to look in all schemas (except SYS and perhaps SYSTEM). So you need to run the procedure (anonymous block) from an account with SYSDBA privileges. Then - I ran into issues with non-standard table and column names (names starting with an underscore and such); which brought to mind identifiers defined in double-quotes - a terrible practice.
For illustration, I will use the HR schema - on which the approach worked. You may need to tweak this further; I wasn't able to make it work by changing the line
and owner = 'HR'
to
and owner != 'SYS'
So - with this long intro - here is what I did.
First, in a "normal" user account (my own, named INTRO - I run a very small database, with only one "normal" user, plus the Oracle "standard" users like SCOTT, HR etc.) - so, in schema INTRO, I created a table to receive the owner name, table name and column name for all columns of data type VARCHAR2 and which contain only "numeric" values or null (numeric defined the way you did.) NOTE HERE: If you then want to really check for all numeric values, you will indeed need a regular expression, or something like what Wernfried has shown; I would still, otherwise, use an EXISTS condition rather than a COUNT in the anonymous procedure.
Then I created an anonymous block to find the needed columns. NOTE: You will not have a schema INTRO - so change it everywhere in my code (both in creating the table and in the anonymous block). If the procedure completes successfully, you should be able to query the table. I show that at the end too.
While logged in as SYS (or another user with SYSDBA powers):
create table intro.cols_with_numbers (
owner_name varchar2(128),
table_name varchar2(128),
column_name varchar2(128)
);
declare x number;
begin
execute immediate 'truncate table intro.cols_with_numbers';
for t in ( select owner, table_name, column_name
from dba_tab_columns
where data_type like 'VARCHAR2%'
and owner = 'HR'
)
loop
execute immediate 'select case when exists (
select *
from ' || t.owner || '.' || t.table_name ||
' where translate(' || t.column_name || ',
''z0123456789'', ''z'') is not null
) then 1 end
from dual'
into x;
if x is null then
insert into intro.cols_with_numbers (owner_name, table_name, column_name)
values(t.owner, t.table_name, t.column_name);
end if;
end loop;
end;
/
Run this procedure and then query the table:
select * from intro.cols_with_numbers;
no rows selected
(which means there were no numeric columns in tables in the HR schema, in the wrong data type VARCHAR2 - or at least, no such columns that had only non-negative integer values.) You can test further, by intentionally creating a table with such a column and testing to see it is "caught" by the procedure.
ADDED - Here is what happens when I change the owner from 'HR' to 'SCOTT':
PL/SQL procedure successfully completed.
OWNER_NAME TABLE_NAME COLUMN_NAME
-------------------- -------------------- --------------------
SCOTT BONUS JOB
SCOTT BONUS ENAME
so it seems to work fine (although on other schemas I sometimes run into an error... I'll see if I can figure out what that is).
In this case the table is empty (no rows!) - this is one example of a "false positive" you may find. (More generally, you will get a false positive if everything in a VARCHAR2 column is null - in all rows of the table.)
NOTE also that a column may have only numeric values and still the best data type would be VARCHAR2. This is the case when the values are simply identifiers and are not meant as "numbers" (which we can compare to each other or to fixed values, and/or with which we can do arithmetic). Example - a SSN (Social Security Number) or the equivalent in other countries; the SSN is each person's "official" identifier for doing business with the government. The SSN is numeric (actually, perhaps to accentuate the fact it is NOT supposed to be a "number" despite the name, it is often written with a couple of dashes...)

Related

Need to build a query with column names stored in another table

I have a table as shown below. This table will be generated dynamically and I have no prior idea about what value it is going to hold.
------------------------------------------
TABLE_NAME COLUMN_NAME CHAR_LENGTH
------------------------------------------
EMPLOYEE COL1 100
EMPLOYEE COL2 200
EMPLOYEE COL3 300
EMPLOYEE COL4 400
Based on this table, I want to build a query in such a way that it would give me those columns, that contains data having char length greater than CHAR_LENGTH column value.
For example if COL2 contains data having char length 500 (>200), then query would give me COL2.
I don't have any draft code to show my attempt, as I have no idea how would I do this.
I don't think this is possible in pure SQL due to the dynamic nature of your requirement. You'll need some form of PL/SQL.
Assuming you're ok with simply outputting the desired results, here is a PL/SQL block that will get the job done:
declare
wExists number(1);
begin
for rec in (select * from your_dynamic_table)
loop
execute immediate 'select count(*)
from dual
where exists (select null
from ' || rec.table_name || ' t
where length(t.' || rec.column_name || ') > ' || rec.char_length || ')'
into wExists;
if wExists = 1 then
dbms_output.put_line(rec.column_name);
end if;
end loop;
end;
You'll also notice the use of the exists clause to optimize the query, so as not to iterate over the whole table unnecessarily, when possible.
Alternatively, if you want the results to be queryable, you can consider converting the code to a pipelined function.
select column_name
from (
select statement that builds the table output
) A
where char_length<length(column_name)
will that help?
You would need a procedure to achieve the same :
Here I am treating the all_tab_columns table in Oracle, which is a default table with much similar structure as your reference example.(Try select * from all_tab_columns). The structure of all_tab_columns is much like yours, except that you will never find a varchar record whose value has exceeded its data length(obvious database level constraint). Date fields may exceed data length, and do reflect in this procedure's output. I am searching all columns in EMPLOYEES whose size exceeds what is specified.
DECLARE
cursor c is select column_name,data_length,table_name from all_tab_columns where table_name=:Table_name;
V_INDEX_NAME all_tab_columns.column_name%type;
v_data_length all_tab_columns.data_length%type;
V_NUMBER PLS_INTEGER;
v_table_name all_tab_columns.table_name%type;
BEGIN
open c;
LOOP
FETCH c into v_index_name,v_data_length,v_table_name;
EXIT when c%NOTFOUND;
v_number :=0;
execute immediate 'select count(*) from '|| :Table_name ||' where length('||v_index_name||')>'||v_data_length into v_number;
if v_number>1 then
dbms_output.put_line(v_index_name||' has values greater than specified'||' '||V_INDEX_NAME||' '||v_data_length);
end if;
END LOOP;
close c;
END;
/
Replace all_tab_columns and its respective columns with the column name of your table.
DEFECTS : The table name is hardcoded. Trying to make the code generic execute immediate or any other trick. Will achieve soon.
EDIT : Defect fixed.

How to choose tables on select from all_tables?

I have the following table name template, there are a couple with the same name and a number at the end: fmj.backup_semaforo_geo_THENUMBER, for example:
select * from fmj.backup_semaforo_geo_06391442
select * from fmj.backup_semaforo_geo_06398164
...
Lets say I need to select a column from every table which succeeds with the 'fmj.backup_semaforo_geo_%' filter, I tried this:
SELECT calle --This column is from the backup_semaforo_geo_# tables
FROM (SELECT table_name
FROM all_tables
WHERE owner = 'FMJ' AND table_name LIKE 'BACKUP_SEMAFORO_GEO_%');
But I'm getting the all_tables tables name data:
TABLE_NAME
----------
BACKUP_SEMAFORO_GEO_06391442
BACKUP_SEMAFORO_GEO_06398164
...
How can I achieve that without getting the all_tables output?
Thanks.
Presumably your current query is getting ORA-00904: "CALLE": invalid identifier, because the subquery doesn't have a column called CALLE. You can't provide a table name to a query at runtime like that, unfortunately, and have to resort to dynamic SQL.
Something like this will loop through all the tables and for each one will get all the values of CALLE from each one, which you can then loop through. I've used DBMS_OUTPUT to display them, assuming you're doing this in SQL*Plus or something that can deal with that; but you may want to do something else with them.
set serveroutput on
declare
-- declare a local collection type we can use for bulk collect; use any table
-- that has the column, or if there isn't a stable one use the actual data
-- type, varchar2(30) or whatever is appropriate
type t_values is table of table.calle%type;
-- declare an instance of that type
l_values t_values;
-- declare a cursor to generate the dynamic SQL; where this is done is a
-- matter of taste (can use 'open x for select ...', then fetch, etc.)
-- If you run the query on its own you'll see the individual selects from
-- all the tables
cursor c1 is
select table_name,
'select calle from ' || owner ||'.'|| table_name as query
from all_tables
where owner = 'FMJ'
and table_name like 'BACKUP_SEMAFORO_GEO%'
order by table_name;
begin
-- loop around all the dynamic queries from the cursor
for r1 in c1 loop
-- for each one, execute it as dynamic SQL, with a bulk collect into
-- the collection type created above
execute immediate r1.query bulk collect into l_values;
-- loop around all the elements in the collection, and print each one
for i in 1..l_values.count loop
dbms_output.put_line(r1.table_name ||': ' || l_values(i));
end loop;
end loop;
end;
/
May be a dynamic SQL in a PLSQL program;
for a in (SELECT table_name
FROM all_tables
WHERE owner = 'FMJ' AND table_name LIKE 'BACKUP_SEMAFORO_GEO_%')
LOOP
sql_stmt := ' SELECT calle FROM' || a.table_name;
EXECUTE IMMEDIATE sql_stmt;
...
...
END LOOP;

Check a whole table for a single value

Background: I'm converting a database table to a format that doesn't support null values. I want to replace the null values with an arbitrary number so my application can support null values.
Question: I'd like to search my whole table for a value ("999999", for example) to make sure that it doesn't appear in the table. I could write a script to test each column individually, but I wanted to know if there is a way I could do this in pure sql without enumerating each field. Is that possible?
You can use a special feature of the PostgreSQL type system:
SELECT *
FROM tbl t
WHERE t::text LIKE '%999999%';
There is a composite type of the same name for every table that you create in PostgreSQL. And there is a text representation for every type in PostgreSQL (to input / output values).
Therefore you can just cast the whole row to text and if the string '999999' is contained in any column (its text representation, to be precise) it is guaranteed to show in the query above.
You cannot rule out false positives completely, though, if separators and / or decorators used by Postgres for the row representation can be part of the search term. It's just very unlikely. And positively not the case for your search term '999999'.
There was a very similar question on codereview.SE recently. I added some more explanation in my answer there.
create or replace function test_values( real ) returns setof record as
$$
declare
query text;
output record;
begin
for query in select 'select distinct ''' || table_name || '''::text table_name, ''' || column_name || '''::text column_name from '|| quote_ident(table_name)||' where ' || quote_ident(column_name) || ' = ''' || $1::text ||'''::' || data_type from information_schema.columns where table_schema='public' and numeric_precision is not null
loop
raise notice '%1 qqqq', query;
execute query::text into output;
return next output;
end loop;
return;
end;$$ language plpgsql;
select distinct * from test_values( 999999 ) as t(table_name text ,column_name text)

Oracle/SQL - Using query results as paramaters in another query -looping?

Hi everyone what I'm wondering if I can do is create a table that lists the record counts of other tables. It would get those table names from a table. So let's assume I have the table TABLE_LIST that looks like this
name
---------
sports_products <-- contains 10 records
house_products <-- contains 8 records
beauty_products <-- contains 15 records
I would like to write a statement that pulls the names from those tables to query them and coount the records and ultimately produce this table
name numRecords
------------------------------
sports_products 10
house_products 8
beauty_products 15
So I think I would need to do something like this pseudo code
select *
from foreach tableName in select name from table_list
select count(*) as numRecords
from tableName
loop
You can have a function that is doing this for you via dynamic sql.
However, make sure to declare it as authid current_user. You do not want anyone to gain some sort of privilege elevation by exploiting your function.
create or replace function SampleFunction
(
owner in VarChar
,tableName in VarChar
) return integer authid current_user is
result Integer;
begin
execute immediate 'select count(*) from "' || owner || '"."' || tableName || '"'
INTO result;
return result;
end;
One option is to simply keep your DB statistics updated, use dbms_stats package or EM, and then
select num_rows
from all_tables
where table_name in (select name from table_list);
I think Robert Giesecke solution will work fine.
A more exotic way of solving this is by using dbms_xmlgen.getxml.
See for example: Identify a table with maximum rows in Oracle

Normalize columns names in ORACLE 11g

I need remove the quotes from the names of the columns in many tables in my schema. there is any way to automate this process?, any function in oracle or some tool that allows me to change the names of the columns removing the quotes. I am using oracle 11g.
UPDATE
I'm sorry, I had to rephrase my question.
thanks in advance.
I assume here by "fields" you mean "column names".
Keep in mind that column names in Oracle are not case sensitive unless you put them in quotes when creating the table. It's generally not a good idea to use quotes around the column names when creating the table. In other words, if you create the table like this:
CREATE TABLE FOO (
colUMN1 varchar2(10),
CoLumn2 number(38)
)
Then you can still run select statements like this:
SELECT column1, column2 FROM FOO
You can also do this:
SELECT COLUMN1, COLUMN2 FROM FOO
Also note that if you run this query, you'll see that Oracle stored the column names as uppercase in the data dictionary:
SELECT * FROM ALL_TAB_COLUMNS WHERE TABLE_NAME = 'FOO'
So there's no need to rename these columns to all uppercase. The queries you write can use all uppercase column names (assuming the tables weren't created using quotes around the column names) and they'll work fine. It's generally a bad idea to try to force them to be case sensitive.
If you just want to get rid of all the case sensitive column names
SQL> create table foo ( "x" number );
Table created.
SQL> ed
Wrote file afiedt.buf
1 begin
2 for x in (select *
3 from user_tab_cols
4 where column_name != UPPER(column_name))
5 loop
6 execute immediate 'ALTER TABLE ' || x.table_name ||
7 ' RENAME column "' || x.column_name || '"' ||
8 ' TO ' || upper(x.column_name);
9 end loop;
10* end;
SQL> /
PL/SQL procedure successfully completed.
SQL> desc foo
Name Null? Type
----------------------------------------- -------- ----------------------------
X NUMBER