Oracle: Count non-null fields for each column in a table - sql

I need a query to count the total number of non-null values for each column in a table. Since my table has hundreds of columns I'm looking for a solution that only requires me to input the table name.
Perhaps using the result of:
select COLUMN_NAME from ALL_TAB_COLUMNS where TABLE_NAME='ORDERS';
to get the column names and then a subquery to put counts against each column name? The additional complication is that I only have read-only access to the DB so I can't create any temp tables.
Slightly out of my league with this one so any help is appreciated.

Construct the query in SQL or using a spreadsheet. Then run the query.
For instance, assuming that your column names are simple and don't have special characters:
select replace('select ''[col]'', count([col]) from orders union all ',
'[col]', COLUMN_NAME
) as sql
from ALL_TAB_COLUMNS
where TABLE_NAME = 'ORDERS';
(Of course, this can be adapted for more complex column names, but I'm trying to show the idea.)
Then copy the code, remove the final union all and run it.
You can put this in one string if there are not too many columns:
select listagg(replace('select ''[col]'', count([col]) from orders',
'[col]', COLUMN_NAME
), ' union all '
) within group (order by column_name) as sql
from ALL_TAB_COLUMNS
where TABLE_NAME = 'ORDERS';
You can also use execute immediate using the same query, but that seems like overkill.

If you're happy with the results row-ar rather than column-ar:
SELECT 'SELECT ''dummy'', 0 FROM DUAL' FROM DUAL
UNION ALL
SELECT
' UNION ALL SELECT ''' ||
column_name ||
''', COUNT(' ||
column_name ||
') FROM ' ||
TABLE_NAME
FROM
all_tab_columns
WHERE
table_name = 'ORDERS'
This is an "SQL that writes an SQL" that you can then copy and run to get your answers. Should make a resultset that looks like:
SELECT 'dummy', 0 FROM dual
UNION ALL SELECT 'col1', COUNT(col1) FROM ORDERS
UNION ALL SELECT 'col2', COUNT(col2) FROM ORDERS
...
If you want your results column-ar:
SELECT 'SELECT '
UNION ALL
SELECT
'COUNT(' ||
column_name ||
') as count_' ||
column_name ||
', ' ||
TABLE_NAME
FROM
all_tab_columns
WHERE
table_name = 'ORDERS'
UNION ALL
SELECT 'null as dummy_column FROM ORDERS'
Should make a resultset that looks like:
SELECT
COUNT(col1) as count_col1,
COUNT(col2) as count_col2,
...
null as dummycoll FROM orders
Caveat: I don't have oracle installed anywhere I can test these, it's written from memory and may need some debugging

This will generate the SQL to get the counts in columns and will handle case sensitive column names and column names with non-alpha-numeric characters:
SELECT 'SELECT '
|| LISTAGG(
'COUNT("' || column_name || '") AS "' || column_name || '"',
', '
) WITHIN GROUP ( ORDER BY column_id )
|| ' FROM "' || table_name || '"' AS sql
FROM ALL_TAB_COLUMNS
WHERE TABLE_NAME = 'ORDERS'
GROUP BY TABLE_NAME;
or, if you have a large number of columns that is generating a string longer than 4000 characters you can use a custom aggregation function to aggregate VARCHAR2s into a CLOB and then do:
SELECT 'SELECT '
|| CLOBAgg( 'COUNT("' || column_name || '") AS "' || column_name || '"' )
|| ' FROM "' || table_name || '"' AS sql
FROM ALL_TAB_COLUMNS
WHERE TABLE_NAME = 'ORDERS'
GROUP BY TABLE_NAME;

In Oracle 19 (I used similar code in Ora 12, maybe that works too), this works without generating another select to execute:
select * from
(
select table_name, column_name,
to_number( extractvalue( xmltype(dbms_xmlgen.getxml('select count(to_char(substr('||column_name||',1,1))) c from '||table_name)) ,'/ROWSET/ROW/C')) count
from all_tab_columns where owner = user
)
--where table_name = 'MY_TABLE'
;
It will create XML with count, from which it extracts the current count. The substr and to_char functions here are used to extract first character, so it will works with CLOB columns also

Related

How to Get max created date for tables selected from all_tab_columns

I have selected (from all_tab_columns) some tables and columns of DATE datatype. I would like to know the MAX date for each table. Could you help me out with a dynamic way to do it? Table names and column names could be different depending on my where clause when selecting from all_tab_columns.
Sample data:
WITH
tabs (TABLE_NAME, COLUMN_NAME, DATA_TYPE) AS
(
Select 'A_ZR_6', 'CREATED_DATE', 'DATE' From dual Union All
Select 'A_ZR_8', 'CREATEDDATE', 'DATE' From dual Union All
Select 'A_ZR_2', 'CREATED_DATE', 'DATE' From dual Union All
Select 'A_ZR_4', 'CREATED_DATE', 'DATE' From dual Union All
Select 'A_ZR_9', 'CREATED_DATE', 'DATE' From dual
)
TABLE_NAME
COLUMN_NAME
DATA_TYPE
A_ZR_6
CREATED_DATE
DATE
A_ZR_8
CREATEDDATE
DATE
A_ZR_2
CREATED_DATE
DATE
A_ZR_4
CREATED_DATE
DATE
A_ZR_9
CREATED_DATE
DATE
The expected result should look like here:
TABLE_NAME
MAX_DATE
A_ZR_6
07-NOV-22
A_ZR_8
12-DEC-22
A_ZR_2
03-OCT-22
A_ZR_4
01-NOV-22
A_ZR_9
31-DEC-22
CODE:
select
table_name,
column_name
from
all_tab_columns
where
owner='ABC' and
table_name not like 'V_%' and
lower(column_name) like '%create%' and
lower(column_name) like '%date%'
group by
table_name,
column_name
I believe that the only method that you can use is a PL/SQL program because the name of table in a SQL statement Oracle cannot dynamic:
SET LINE 100;
SET SERVEROUTPUT ON;
DECLARE
CURSOR QCursor is
SELECT
X.table_name,
'SELECT MAX(' || X.column_name || ') from ' || owner || '.' || X.table_name as query
FROM (
SELECT
owner,
table_name,
column_name
FROM
all_tab_columns
WHERE
owner = 'ABC'
AND table_name not like 'V_%'
AND lower(column_name) like '%create%'
AND lower(column_name) like '%date%'
GROUP BY owner,
table_name,
column_name
) X;
--# RIGA DEL CURSORE CONTROLLO
ROW_QCursor QCursor%ROWTYPE;
v_max_date DATE;
BEGIN
DBMS_OUTPUT.ENABLE(1000000);
OPEN ROW_QCursor;
LOOP
FETCH QCursor INTO ROW_QCursor;
EXIT WHEN QCursor%NOTFOUND;
EXECUTE IMMEDIATE ROW_QCursor.query INTO v_max_date;
DBMS_OUTPUT.PUT_LINE('Table_name :' || ROW_QCursor.table_name || ' Max date is : ' || v_max_date);
END LOOP;
CLOSE ROW_QCursor;
END;
/
Alternatively, if the query is manually (es. Sql Developer) you can run this query:
select
X.table_name,
'SELECT MAX(' || X.column_name || ') from ' || owner || '.' || X.table_name as query
from (
select
owner,
table_name,
column_name
from
all_tab_columns
where
owner = 'ABC'
and table_name not like 'V_%'
and lower(column_name) like '%create%'
and lower(column_name) like '%date%'
group by owner,
table_name,
column_name
) X;
and then copy the output and execute manually:
TABLE_ QUERY
------ ----------------------------------------
A_ZR_2 SELECT MAX(CREATED_DATE) from ABC.A_ZR_2
A_ZR_9 SELECT MAX(CREATED_DATE) from ABC.A_ZR_9
A_ZR_8 SELECT MAX(CREATED_DATE) from ABC.A_ZR_8
A_ZR_6 SELECT MAX(CREATED_DATE) from ABC.A_ZR_6
A_ZR_4 SELECT MAX(CREATED_DATE) from ABC.A_ZR_4
I hope I was helpful.
Thank you.

Get count of rows from multiple tables Redshift SQL?

I have a redshift database that is being updated with new tables so I can't just manually list the tables I want. I want to get a count of the rows of all the tables from my query. So far I have:
select 'SELECT ''' || table_name || ''' as table_name, count(*) As con ' ||
'FROM ' || table_name ||
CASE WHEN lead(table_name) OVER (order by table_name ) IS NOT NULL
THEN ' UNION ALL ' END
FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_NAME LIKE '%results%'
but when I do this I get the error:
Specified types or functions (one per INFO message) not supported on Redshift tables.
I've searched a lot but I can't seem to find a solution for my problem. Any help would be greatly appreciated. Thanks!
EDIT:
I've changed my approach to this and decided to use a for loop in R to get the row counts of each but I'm running into the issue that 'row_counts' is only saving one number, not the count of each row like I want. Here is the code:
schema <- "x"
table_prefix <- "results"
geos <- ad_districts %>% filter(geo != "geo")
row_count <- list()
i = 1
for (geo in geos){
table_name <- paste0(schema, ".", table_prefix, geo)
row_count[[i]] <- dbGetQuery(con,
paste("SELECT COUNT(*) FROM", table_name))
i = i + 1
}
Your query is doing a select * for all tables, this will take a lot of time and resources. Instead use a system table to get the same info
select name, sum(rows) as rows
from stv_tbl_perm
where name like '%results%'
group by 1
[EDIT] - I think this is the root cause - some sql functions are only supported on the leader node. Try connecting to that node and re-run your SQL.
https://docs.aws.amazon.com/redshift/latest/dg/c_sql-functions-leader-node.html
Hope this helps.
select 'select count(*) as "' || table_schema || '.' || table_name || '" from ' || table_schema || '.' || table_name || ' ;' as sql_text
from information_schema.tables
;
[EDIT - refined this a bit to generate a series of statements that can be run at once]
select rownum, case when rownum > 1 then sql_text else replace(sql_text, 'union all', '') end as sql_text
from
(
select rank() over (order by sql_text DESC) as rownum,
sql_text
from
(
select 'select ''' || table_schema || ' ' || table_name || ''' , count(*) as "' || table_schema || '.' || table_name || '" from ' || table_schema || '.' || table_name || ' union all ' as sql_text
from information_schema.tables
where table_schema = 'public'
order by table_schema, table_name
)X
)Y
order by rownum desc ;
SELECT ' Select count(*) , '''+ tablename + ''' from '+'"' + tablename +'"' +' Union ALL '
FROM pg_table_def
GROUP BY tablename
Above query eliminates any table name with space. Remove UNION ALL at the end of the query and query will be ready to be executed.

Get column names in subquery, and then return values for those columns?

Seems like this is impossible, but I'm so close - maybe someone can take me the last step...
I have a bunch of dynamic code and I don't always know the tables and columns I'm going to be dealing with, but I do know that VARCHAR2 columns with data_lengths of 2000 result in errors. I'd love to be able to identify these 'bad' columns dynamically, and remove them from my results in 1 shot.
This code:
SELECT LISTAGG(probs.column_name, ', ')
WITHIN GROUP (ORDER BY column_name) FROM
(select 1 grp, column_name
from all_tab_columns
where TABLE_NAME = 'MYTABLE' AND
DATA_TYPE <> 'VARCHAR2' AND
DATA_LENGTH < 2000
) probs
GROUP BY GRP
Gives me a nice comma, separated list of all of my acceptable column names like this:
FIELD1, FIELD2, FIELD3, FIELD4...
And I am hopeful that there's a way a can simply do something to drop that list of field names into a select statement like this:
SELECT (<my subquery, above>)
FROM MYTABLE;
Is this possible?
Assuming this situation
create table mytable ( a number, b number, c number)
insert into mytable values (10, 20, 30)
insert into mytable values (1, 2, 3)
and that only exists one table with that name (otherwise you should specify the owner in the query from all_tab_columns), your query could be simplified this way:
SELECT 'select ' || LISTAGG(column_name, ', ') WITHIN GROUP (ORDER BY column_name) || ' from ' || table_name
FROM all_tab_columns
WHERE TABLE_NAME = 'MYTABLE'
AND DATA_TYPE <> 'VARCHAR2'
AND DATA_LENGTH < 2000
GROUP BY table_name
this would give: select A, B, C from MYTABLE.
The problem here is that you can not simply run a statement that returns a variable number of columns; one way to use this could be building an xml:
SELECT xmltype(
DBMS_XMLGEN.getxml(
( SELECT 'select ' || LISTAGG(column_name, ', ') WITHIN GROUP (ORDER BY column_name) || ' from ' || table_name
FROM all_tab_columns
WHERE TABLE_NAME = 'MYTABLE'
AND DATA_TYPE <> 'VARCHAR2'
AND DATA_LENGTH < 2000
GROUP BY table_name)
)
)
FROM DUAL
<?xml version="1.0"?>
<ROWSET>
<ROW>
<A>10</A>
<B>20</B>
<C>30</C>
</ROW>
<ROW>
<A>1</A>
<B>2</B>
<C>3</C>
</ROW>
</ROWSET>
Another way could be using some PLSQL and dynamic SQL, with a little modification of yur query to concatenate the fields, to build the result in a unique string:
declare
type tTabResults is table of varchar2(1000);
vSQL varchar2(1000);
vTabResults tTabResults;
begin
SELECT 'select ' || LISTAGG( column_name, '|| '', '' ||') WITHIN GROUP (ORDER BY column_name) || ' from ' || table_name
into vSQL
FROM all_tab_columns
WHERE TABLE_NAME = 'MYTABLE'
AND DATA_TYPE <> 'VARCHAR2'
AND DATA_LENGTH < 2000
GROUP BY table_name;
--
execute immediate vSQL bulk collect into vTabResults;
--
for i in vTabResults.first .. vTabResults.last loop
dbms_output.put_line(vTabResults(i));
end loop;
end;
10, 20, 30
1, 2, 3
Notice that I oversimplified the problem, treating numbers as strings and not using any conversion, by simply printing the values in your table, no matter their type; in a real solution you should handle the possible types of your columns and modify the initial query to add some type conversions.

Comparing column by column between two rows in Oracle DB

I need to write a query to compare column by column (ie: find differences) between two rows in the database. For example:
row1: 10 40 sometext 24
row2: 10 25 sometext 24
After the query executed, it should shows only the fields that have difference (ie: the second field)
Here's what I have done so far:
select table1.column1, table1.column2, table1.column3, table1.column4
from table1
where somefield in (field1, field2);
The above query will show me two rows one above another like this:
10 40 sometext 24
10 25 sometext 24
Then I have to manually do the comparison and it takes a lot of time b/c the row contains a lot of column.
So again my question is: How can I write a query that will show me only the columns that have differences??
Thanks
Use UNPIVOT clause (see http://www.oracle-developer.net/display.php?id=506) to turn columns into rows, then filter out the same rows (using GROUP BY HAVING COUNT and finally use PIVOT to get rows with different columns only.
To do this easily you need to query the metadata for the table to get each row. You can use the following code as a script.
Replace the define table_name with your table name and define yes_drop_it = NO. Put your raw WHERE syntax into the where_clause. The comparison logic always compares the first two rows returned for the where clause.
whenever sqlerror exit failure rollback;
set linesize 150
define test_tab_name = tst_cf_cols
define yes_drop_it = YES
define order_by = 1, 2
define where_clause = 1 = 1
define tab_owner = user
<<clearfirst>> begin
for clearout in (
select 'drop table ' || table_name as cmd
from all_tables
where owner = &&tab_owner and table_name = upper('&&test_tab_name')
and '&&yes_drop_it' = 'YES'
) loop
execute immediate clearout.cmd;
execute immediate '
create table &&test_tab_name as
select 10 as column1, 40 as column2, ''sometext'' as column3, 24 as column4 from dual
union all
select 10 as column1, 25 as column2, ''sometext'' as column3, 24 as column4 from dual
';
end loop;
end;
/
column cfsynt format a4000 word_wrap new_value comparison_syntax
with parms as (select 'parmquery' as cte_name, 'row_a' as corr_name_1, 'row_b' as corr_name_2 from dual)
select
'select * from (select ' || LISTAGG(cfcol || ' AS cf_' || trim (to_char (column_id, '000')) || '_' || column_name
, chr(13) || ', ') WITHIN GROUP (order by column_id)
|| chr(13) || ' from (select * from parmquery where row_number = 1) ' || corr_name_1
|| chr(13) || ', (select * from parmquery where row_number = 2) ' || corr_name_2
|| chr(13) || ') where ''DIFFERENT'' IN (' || LISTAGG ('cf_' || trim (to_char (column_id, '000')) || '_' || column_name, chr(13) || ', ') within group (order by column_id) || ')'
as cfsynt
from parms, (
select
'decode (' || corr_name_1 || '.' || column_name || ', ' || corr_name_2
|| '.' || column_name || ', ''SAME'', ''DIFFERENT'')'
as cfcol,
column_name,
column_id
from
parms,
all_tab_columns
where
owner = &&tab_owner and table_name = upper ('&&test_tab_name')
);
with parmquery as (select rownum as row_number, vals.* from (
select * from &&test_tab_name
where &&where_clause
order by &&order_by
) vals
) &&comparison_syntax
;

trying to execute a query within a query

I have a query that display another query I need to execute:
So the first part just writes out as text the first part of the query I want to execute
SELECT distinct 'SELECT COUNT(txn_id) FROM '
I then add on all the tables that I want to execute that initial part of the query on
table_name from all_tab_columns WHERE OWNER='RGSWKF_PRGM' AND COLUMN_NAME like '%TXN_ID%';
So my complete query is
SELECT distinct 'SELECT COUNT(txn_id) FROM ' || table_name from all_tab_columns WHERE OWNER='RGSWKF_PRGM' AND COLUMN_NAME like '%TXN_ID%';
This gives me a list of the queries I want to execute like so:
SELECT COUNT(txn_id) FROM MEETING_TXN_LIST
SELECT COUNT(txn_id) FROM TXN_COMMENT
SELECT COUNT(txn_id) FROM TXN_DEAL_FEE
....etc. I was told once I have this result I can auto execute the queries that are created as result of this by adding something to my original query but I can't find anything as of yet?
So basically I want it to execute from one query:
SELECT COUNT(txn_id) FROM MEETING_TXN_LIST
then
SELECT COUNT(txn_id) FROM TXN_COMMENT
then
SELECT COUNT(txn_id) FROM TXN_DEAL_FEE
etc. all in one query.
union
using union all with single quotes gives me the result with the text
SELECT COUNT(txn_id) FROM TXN_COMMENT union all ..etc...
Without the single quotes gives me the following error
ORA-00936: missing expression
00936. 00000 - "missing expression"
I would suggest that you generate a query with the subqueries connected by union all:
SELECT 'MEETING_TXN_LIST' as table_name, COUNT(txn_id) as cnt FROM MEETING_TXN_LIST UNION ALL
SELECT 'TXN_COMMENT', COUNT(txn_id) FROM TXN_COMMENT union all
SELECT 'TXN_DEAL_FEE' COUNT(txn_id) FROM TXN_DEAL_FEE;
The query for this is basically:
SELECT 'SELECT ''' || table_name || ''' as table_name, COUNT(txn_id) as cnt
FROM ' || table_name || ' union all '
from all_tab_columns
WHERE OWNER = 'RGSWKF_PRGM' AND COLUMN_NAME = 'TXN_ID';
Note that you need to remove the final union all from the last row. And, I changed the query to look only for the column TXN_ID, because that is what you are using in the queries.
select txt
|| case
when row_number() over (order by rn desc) = 1 then null
else ul
end
from (select 'SELECT '''
|| table_name
|| ''' as table_name, COUNT(txn_id) as cnt FROM '
|| table_name
as txt
,' union all ' ul
,rownum rn
from all_tab_columns
where OWNER = 'RGSWKF_PRGM' and COLUMN_NAME = 'TXN_ID')
order by rn