Count number of null values for every column on a table - sql

I would like to calculate, for each column in a table, the percent of rows that are null.
For one column, I was using:
SELECT ((SELECT COUNT(Col1)
FROM Table1)
/
(SELECT COUNT(*)
FROM Table1)) AS Table1Stats
Works great and is fast.
However, I want to do this for all ~50 columns of the table, and my environment does not allow me to use dynamic SQL.
Any recommendations? I am using snowflake to connect to AWS, but as an end user I am using the snowflake browser interface.

You can combine this as:
SELECT COUNT(Col1) * 1.0 / COUNT(*)
FROM Table1;
Or, if you prefer:
SELECT AVG( (Col1 IS NOT NULL)::INT )
FROM Table1;

You can use a mix of object_construct() and flatten() to move the column names into rows. Then do the math for the values missing:
create or replace temp table many_cols as
select 1 a, 2 b, 3 c, 4 d
union all select 1, null, 3, 4
union all select 8, 8, null, null
union all select 8, 8, 7, null
union all select null, null, null, null;
select key column_name
, 1-count(*)/(select count(*) from many_cols) ratio_null
from (
select object_construct(a.*) x
from many_cols a
), lateral flatten(x)
group by key
;

You can do this using a SQL generator if you don't mind copying the text and running it once it's done.
-- SQL generator option:
select 'select' || listagg(' ((select count(' || COLUMN_NAME || ') from "SNOWFLAKE_SAMPLE_DATA"."TPCH_SF10000"."ORDERS") / ' ||
'(select count(*) from "SNOWFLAKE_SAMPLE_DATA"."TPCH_SF10000"."ORDERS")) as ' || COLUMN_NAME, ',') as SQL_STATEMENT
from "SNOWFLAKE_SAMPLE_DATA"."INFORMATION_SCHEMA"."COLUMNS"
where TABLE_CATALOG = 'SNOWFLAKE_SAMPLE_DATA' and TABLE_SCHEMA = 'TPCH_SF10000' and TABLE_NAME = 'ORDERS'
;
If the copy and paste is not plausible because you need to script it, you can use the results of the SQL generator in a stored procedure I wrote to execute a single line of dynamic SQL:
call run_dynamic_sql(
select 'select' || listagg(' ((select count(' || COLUMN_NAME || ') from "SNOWFLAKE_SAMPLE_DATA"."TPCH_SF10000"."ORDERS") / ' ||
'(select count(*) from "SNOWFLAKE_SAMPLE_DATA"."TPCH_SF10000"."ORDERS")) as ' || COLUMN_NAME, ',') as SQL_STATEMENT
from "SNOWFLAKE_SAMPLE_DATA"."INFORMATION_SCHEMA"."COLUMNS"
where TABLE_CATALOG = 'SNOWFLAKE_SAMPLE_DATA' and TABLE_SCHEMA = 'TPCH_SF10000' and TABLE_NAME = 'ORDERS'
);
If you want the stored procedure, until it's published on Snowflake's blog it's available here: https://snowflake.pavlik.us/index.php/2021/01/22/running-dynamic-sql-in-snowflake/

Related

Select query using json format value

If customer first_name-'Monika',
last_name='Awasthi'
Then I am using below query to return value in json format:
SELECT *
FROM
(
SELECT JSON_ARRAYAGG(JSON_OBJECT('CODE' IS '1','VALUE' IS 'Monika'||' '||'Awasthi'))
FROM DUAL
);
It is working fine & give below output:
[{"CODE":"1","VALUE":"Monika Awasthi"}]
But I want one more value which should be reversed means output should be:
[{"CODE":"1","VALUE":"Monika Awasthi"},{"CODE":"2","VALUE":"Awasthi Monika"}]
Kindly give me some suggestions. Thank You
Another approach is to use a CTE to generate the two codes and values; your original version could be written to get the name data from a table or CTE:
-- CTE for sample data
WITH cte (first_name, last_name) AS (
SELECT 'Monika', 'Awasthi' FROM DUAL
)
-- query against CTE or table
SELECT JSON_ARRAYAGG(JSON_OBJECT('CODE' IS '1','VALUE' IS last_name ||' '|| first_name))
FROM cte;
And you could then extend that with a CTE that generates the value with the names in both orders:
WITH cte1 (first_name, last_name) AS (
SELECT 'Monika', 'Awasthi' FROM DUAL
),
cte2 (code, value) AS (
SELECT 1 AS code, first_name || ' ' || last_name FROM cte1
UNION ALL
SELECT 2 AS code, last_name || ' ' || first_name FROM cte1
)
SELECT JSON_ARRAYAGG(JSON_OBJECT('CODE' IS code,'VALUE' IS value))
FROM cte2;
which gives:
JSON_ARRAYAGG(JSON_OBJECT('CODE'ISCODE,'VALUE'ISVALUE))
-------------------------------------------------------------------------
[{"CODE":1,"VALUE":"Monika Awasthi"},{"CODE":2,"VALUE":"Awasthi Monika"}]
db<>fiddle
A simple logic through use of SQL(without using PL/SQL) in order to generate code values as only be usable for two columns as in this case might be
SELECT JSON_ARRAYAGG(
JSON_OBJECT('CODE' IS tt.column_id,
'VALUE' IS CASE WHEN column_id=1
THEN name||' '||surname
ELSE surname||' '||name
END)
) AS result
FROM t
CROSS JOIN (SELECT column_id FROM user_tab_cols WHERE table_name = 'T') tt
where t is a table which hold name and surname columns
Demo
More resilient solution might be provided through use of PL/SQL, even more columns exist within the data source such as
DECLARE
v_jso VARCHAR2(4000);
v_arr OWA.VC_ARR;
v_arr_t JSON_ARRAY_T := JSON_ARRAY_T();
BEGIN
FOR c IN ( SELECT column_id FROM user_tab_cols WHERE table_name = 'T' )
LOOP
SELECT 'JSON_OBJECT( ''CODE'' IS '||MAX(c.column_id)||',
''VALUE'' IS '||LISTAGG(column_name,'||'' ''||')
WITHIN GROUP (ORDER BY ABS(column_id-c.column_id))
||' )'
INTO v_arr(c.column_id)
FROM ( SELECT * FROM user_tab_cols WHERE table_name = 'T' );
EXECUTE IMMEDIATE 'SELECT '||v_arr(c.column_id)||' FROM t' INTO v_jso;
v_arr_t.APPEND(JSON_OBJECT_T(v_jso));
END LOOP;
DBMS_OUTPUT.PUT_LINE(v_arr_t.STRINGIFY);
END;
/
Demo
As I explained in a comment under your question, I am not clear on how you define the CODE values for your JSON string (assuming you have more than one customer).
Other than that, if you need to create a JSON array of objects from individual strings (as in your attempt), you probably need to use JSON_ARRAY rather than JSON_ARRAYAGG. Something like I show below. Incidentally, I also don't know why you needed to SELECT * FROM (subquery) - the outer SELECT seems entirely unnecessary.
So, if you don't actually aggregate over a table, but just need to build a JSON array from individual pieces:
select json_array
(
json_object('CODE' is '1', 'VALUE' is first_name || ' ' || last_name ),
json_object('CODE' is '2', 'VALUE' is last_name || ' ' || first_name)
) as result
from ( select 'Monika' as first_name, 'Awasthi' as last_name from dual )
;
RESULT
------------------------------------------------------------------------------
[{"CODE":"1","VALUE":"Monika Awasthi"},{"CODE":"2","VALUE":"Awasthi Monika"}]

Oracle: Count non-null fields for each column in a table

I need a query to count the total number of non-null values for each column in a table. Since my table has hundreds of columns I'm looking for a solution that only requires me to input the table name.
Perhaps using the result of:
select COLUMN_NAME from ALL_TAB_COLUMNS where TABLE_NAME='ORDERS';
to get the column names and then a subquery to put counts against each column name? The additional complication is that I only have read-only access to the DB so I can't create any temp tables.
Slightly out of my league with this one so any help is appreciated.
Construct the query in SQL or using a spreadsheet. Then run the query.
For instance, assuming that your column names are simple and don't have special characters:
select replace('select ''[col]'', count([col]) from orders union all ',
'[col]', COLUMN_NAME
) as sql
from ALL_TAB_COLUMNS
where TABLE_NAME = 'ORDERS';
(Of course, this can be adapted for more complex column names, but I'm trying to show the idea.)
Then copy the code, remove the final union all and run it.
You can put this in one string if there are not too many columns:
select listagg(replace('select ''[col]'', count([col]) from orders',
'[col]', COLUMN_NAME
), ' union all '
) within group (order by column_name) as sql
from ALL_TAB_COLUMNS
where TABLE_NAME = 'ORDERS';
You can also use execute immediate using the same query, but that seems like overkill.
If you're happy with the results row-ar rather than column-ar:
SELECT 'SELECT ''dummy'', 0 FROM DUAL' FROM DUAL
UNION ALL
SELECT
' UNION ALL SELECT ''' ||
column_name ||
''', COUNT(' ||
column_name ||
') FROM ' ||
TABLE_NAME
FROM
all_tab_columns
WHERE
table_name = 'ORDERS'
This is an "SQL that writes an SQL" that you can then copy and run to get your answers. Should make a resultset that looks like:
SELECT 'dummy', 0 FROM dual
UNION ALL SELECT 'col1', COUNT(col1) FROM ORDERS
UNION ALL SELECT 'col2', COUNT(col2) FROM ORDERS
...
If you want your results column-ar:
SELECT 'SELECT '
UNION ALL
SELECT
'COUNT(' ||
column_name ||
') as count_' ||
column_name ||
', ' ||
TABLE_NAME
FROM
all_tab_columns
WHERE
table_name = 'ORDERS'
UNION ALL
SELECT 'null as dummy_column FROM ORDERS'
Should make a resultset that looks like:
SELECT
COUNT(col1) as count_col1,
COUNT(col2) as count_col2,
...
null as dummycoll FROM orders
Caveat: I don't have oracle installed anywhere I can test these, it's written from memory and may need some debugging
This will generate the SQL to get the counts in columns and will handle case sensitive column names and column names with non-alpha-numeric characters:
SELECT 'SELECT '
|| LISTAGG(
'COUNT("' || column_name || '") AS "' || column_name || '"',
', '
) WITHIN GROUP ( ORDER BY column_id )
|| ' FROM "' || table_name || '"' AS sql
FROM ALL_TAB_COLUMNS
WHERE TABLE_NAME = 'ORDERS'
GROUP BY TABLE_NAME;
or, if you have a large number of columns that is generating a string longer than 4000 characters you can use a custom aggregation function to aggregate VARCHAR2s into a CLOB and then do:
SELECT 'SELECT '
|| CLOBAgg( 'COUNT("' || column_name || '") AS "' || column_name || '"' )
|| ' FROM "' || table_name || '"' AS sql
FROM ALL_TAB_COLUMNS
WHERE TABLE_NAME = 'ORDERS'
GROUP BY TABLE_NAME;
In Oracle 19 (I used similar code in Ora 12, maybe that works too), this works without generating another select to execute:
select * from
(
select table_name, column_name,
to_number( extractvalue( xmltype(dbms_xmlgen.getxml('select count(to_char(substr('||column_name||',1,1))) c from '||table_name)) ,'/ROWSET/ROW/C')) count
from all_tab_columns where owner = user
)
--where table_name = 'MY_TABLE'
;
It will create XML with count, from which it extracts the current count. The substr and to_char functions here are used to extract first character, so it will works with CLOB columns also

Vertica. Count of Null and Not-Null of all columns of a Table

How can we get null and non-null counts of all columns of a Table in Vertica? Table can have n number of columns and for each column we need to get count of nulls and non-nulls values of that table.
For Example.
Below Table has two columns
column1 Column2
1 abc
pqr
3
asd
5
If its a specific column then we can check like
SELECT COUNT(*) FROM table where column1 is null;
SELECT COUNT(*) FROM table where column1 is not null;
Same query for column2
I checked system tables like projection_storage and others but I cant figure out a generic query which gives details by hard coding only TABLE NAME in the query.
Hello #user2452689: Here is a dynamically generated VSQL statement which meets your requirement of counting nulls & not nulls in N columns. Notice that this writes a temporary SQL file out to your working directory, and then execute it via the \i command. You only need to change the first two variables per table. Hope this helps - good luck! :-D
--CHANGE SCHEMA AND TABLE PARAMETERS ONLY:
\set table_schema '\'public\''
\set table_name '\'dim_promotion\''
---------
\o temp_sql_file
\pset tuples_only
select e'select \'' || :table_schema || e'\.' || :table_name || e'\' as table_source' as txt
union all
select * from (
select
', sum(case when ' || column_name || ' is not null then 1 else 0 end) as ' || column_name || '_NOT_NULL
, sum(case when ' || column_name || ' is null then 1 else 0 end) as ' || column_name || '_NULL' as txt
from columns
where table_schema = :table_schema
and table_name = :table_name
order by ordinal_position
) x
union all
select ' from ' || :table_schema || e'.' || :table_name || ';' as txt ;
\o
\pset tuples_only
\i temp_sql_file
You can use:
select count(*) as cnt,
count(column1) as cnt_column1,
count(column2) as cnt_column2
from t;
count() with a column name or expression counts the number of non-NULL values in the column/expression.
(Obviously, the number of NULL values is cnt - cnt_columnX.)
select column1_not_null
,column2_not_null
,column3_not_null
,cnt - column1_not_null as column1_null
,cnt - column2_not_null as column2_null
,cnt - column3_not_null as column3_null
from (select count(*) as cnt
,count (column1) as column1_not_null
,count (column2) as column2_not_null
,count (column3) as column3_not_null
from mytable
) t

Get column names in subquery, and then return values for those columns?

Seems like this is impossible, but I'm so close - maybe someone can take me the last step...
I have a bunch of dynamic code and I don't always know the tables and columns I'm going to be dealing with, but I do know that VARCHAR2 columns with data_lengths of 2000 result in errors. I'd love to be able to identify these 'bad' columns dynamically, and remove them from my results in 1 shot.
This code:
SELECT LISTAGG(probs.column_name, ', ')
WITHIN GROUP (ORDER BY column_name) FROM
(select 1 grp, column_name
from all_tab_columns
where TABLE_NAME = 'MYTABLE' AND
DATA_TYPE <> 'VARCHAR2' AND
DATA_LENGTH < 2000
) probs
GROUP BY GRP
Gives me a nice comma, separated list of all of my acceptable column names like this:
FIELD1, FIELD2, FIELD3, FIELD4...
And I am hopeful that there's a way a can simply do something to drop that list of field names into a select statement like this:
SELECT (<my subquery, above>)
FROM MYTABLE;
Is this possible?
Assuming this situation
create table mytable ( a number, b number, c number)
insert into mytable values (10, 20, 30)
insert into mytable values (1, 2, 3)
and that only exists one table with that name (otherwise you should specify the owner in the query from all_tab_columns), your query could be simplified this way:
SELECT 'select ' || LISTAGG(column_name, ', ') WITHIN GROUP (ORDER BY column_name) || ' from ' || table_name
FROM all_tab_columns
WHERE TABLE_NAME = 'MYTABLE'
AND DATA_TYPE <> 'VARCHAR2'
AND DATA_LENGTH < 2000
GROUP BY table_name
this would give: select A, B, C from MYTABLE.
The problem here is that you can not simply run a statement that returns a variable number of columns; one way to use this could be building an xml:
SELECT xmltype(
DBMS_XMLGEN.getxml(
( SELECT 'select ' || LISTAGG(column_name, ', ') WITHIN GROUP (ORDER BY column_name) || ' from ' || table_name
FROM all_tab_columns
WHERE TABLE_NAME = 'MYTABLE'
AND DATA_TYPE <> 'VARCHAR2'
AND DATA_LENGTH < 2000
GROUP BY table_name)
)
)
FROM DUAL
<?xml version="1.0"?>
<ROWSET>
<ROW>
<A>10</A>
<B>20</B>
<C>30</C>
</ROW>
<ROW>
<A>1</A>
<B>2</B>
<C>3</C>
</ROW>
</ROWSET>
Another way could be using some PLSQL and dynamic SQL, with a little modification of yur query to concatenate the fields, to build the result in a unique string:
declare
type tTabResults is table of varchar2(1000);
vSQL varchar2(1000);
vTabResults tTabResults;
begin
SELECT 'select ' || LISTAGG( column_name, '|| '', '' ||') WITHIN GROUP (ORDER BY column_name) || ' from ' || table_name
into vSQL
FROM all_tab_columns
WHERE TABLE_NAME = 'MYTABLE'
AND DATA_TYPE <> 'VARCHAR2'
AND DATA_LENGTH < 2000
GROUP BY table_name;
--
execute immediate vSQL bulk collect into vTabResults;
--
for i in vTabResults.first .. vTabResults.last loop
dbms_output.put_line(vTabResults(i));
end loop;
end;
10, 20, 30
1, 2, 3
Notice that I oversimplified the problem, treating numbers as strings and not using any conversion, by simply printing the values in your table, no matter their type; in a real solution you should handle the possible types of your columns and modify the initial query to add some type conversions.

trying to execute a query within a query

I have a query that display another query I need to execute:
So the first part just writes out as text the first part of the query I want to execute
SELECT distinct 'SELECT COUNT(txn_id) FROM '
I then add on all the tables that I want to execute that initial part of the query on
table_name from all_tab_columns WHERE OWNER='RGSWKF_PRGM' AND COLUMN_NAME like '%TXN_ID%';
So my complete query is
SELECT distinct 'SELECT COUNT(txn_id) FROM ' || table_name from all_tab_columns WHERE OWNER='RGSWKF_PRGM' AND COLUMN_NAME like '%TXN_ID%';
This gives me a list of the queries I want to execute like so:
SELECT COUNT(txn_id) FROM MEETING_TXN_LIST
SELECT COUNT(txn_id) FROM TXN_COMMENT
SELECT COUNT(txn_id) FROM TXN_DEAL_FEE
....etc. I was told once I have this result I can auto execute the queries that are created as result of this by adding something to my original query but I can't find anything as of yet?
So basically I want it to execute from one query:
SELECT COUNT(txn_id) FROM MEETING_TXN_LIST
then
SELECT COUNT(txn_id) FROM TXN_COMMENT
then
SELECT COUNT(txn_id) FROM TXN_DEAL_FEE
etc. all in one query.
union
using union all with single quotes gives me the result with the text
SELECT COUNT(txn_id) FROM TXN_COMMENT union all ..etc...
Without the single quotes gives me the following error
ORA-00936: missing expression
00936. 00000 - "missing expression"
I would suggest that you generate a query with the subqueries connected by union all:
SELECT 'MEETING_TXN_LIST' as table_name, COUNT(txn_id) as cnt FROM MEETING_TXN_LIST UNION ALL
SELECT 'TXN_COMMENT', COUNT(txn_id) FROM TXN_COMMENT union all
SELECT 'TXN_DEAL_FEE' COUNT(txn_id) FROM TXN_DEAL_FEE;
The query for this is basically:
SELECT 'SELECT ''' || table_name || ''' as table_name, COUNT(txn_id) as cnt
FROM ' || table_name || ' union all '
from all_tab_columns
WHERE OWNER = 'RGSWKF_PRGM' AND COLUMN_NAME = 'TXN_ID';
Note that you need to remove the final union all from the last row. And, I changed the query to look only for the column TXN_ID, because that is what you are using in the queries.
select txt
|| case
when row_number() over (order by rn desc) = 1 then null
else ul
end
from (select 'SELECT '''
|| table_name
|| ''' as table_name, COUNT(txn_id) as cnt FROM '
|| table_name
as txt
,' union all ' ul
,rownum rn
from all_tab_columns
where OWNER = 'RGSWKF_PRGM' and COLUMN_NAME = 'TXN_ID')
order by rn