Vertica. Count of Null and Not-Null of all columns of a Table - sql

How can we get null and non-null counts of all columns of a Table in Vertica? Table can have n number of columns and for each column we need to get count of nulls and non-nulls values of that table.
For Example.
Below Table has two columns
column1 Column2
1 abc
pqr
3
asd
5
If its a specific column then we can check like
SELECT COUNT(*) FROM table where column1 is null;
SELECT COUNT(*) FROM table where column1 is not null;
Same query for column2
I checked system tables like projection_storage and others but I cant figure out a generic query which gives details by hard coding only TABLE NAME in the query.

Hello #user2452689: Here is a dynamically generated VSQL statement which meets your requirement of counting nulls & not nulls in N columns. Notice that this writes a temporary SQL file out to your working directory, and then execute it via the \i command. You only need to change the first two variables per table. Hope this helps - good luck! :-D
--CHANGE SCHEMA AND TABLE PARAMETERS ONLY:
\set table_schema '\'public\''
\set table_name '\'dim_promotion\''
---------
\o temp_sql_file
\pset tuples_only
select e'select \'' || :table_schema || e'\.' || :table_name || e'\' as table_source' as txt
union all
select * from (
select
', sum(case when ' || column_name || ' is not null then 1 else 0 end) as ' || column_name || '_NOT_NULL
, sum(case when ' || column_name || ' is null then 1 else 0 end) as ' || column_name || '_NULL' as txt
from columns
where table_schema = :table_schema
and table_name = :table_name
order by ordinal_position
) x
union all
select ' from ' || :table_schema || e'.' || :table_name || ';' as txt ;
\o
\pset tuples_only
\i temp_sql_file

You can use:
select count(*) as cnt,
count(column1) as cnt_column1,
count(column2) as cnt_column2
from t;
count() with a column name or expression counts the number of non-NULL values in the column/expression.
(Obviously, the number of NULL values is cnt - cnt_columnX.)

select column1_not_null
,column2_not_null
,column3_not_null
,cnt - column1_not_null as column1_null
,cnt - column2_not_null as column2_null
,cnt - column3_not_null as column3_null
from (select count(*) as cnt
,count (column1) as column1_not_null
,count (column2) as column2_not_null
,count (column3) as column3_not_null
from mytable
) t

Related

Count number of null values for every column on a table

I would like to calculate, for each column in a table, the percent of rows that are null.
For one column, I was using:
SELECT ((SELECT COUNT(Col1)
FROM Table1)
/
(SELECT COUNT(*)
FROM Table1)) AS Table1Stats
Works great and is fast.
However, I want to do this for all ~50 columns of the table, and my environment does not allow me to use dynamic SQL.
Any recommendations? I am using snowflake to connect to AWS, but as an end user I am using the snowflake browser interface.
You can combine this as:
SELECT COUNT(Col1) * 1.0 / COUNT(*)
FROM Table1;
Or, if you prefer:
SELECT AVG( (Col1 IS NOT NULL)::INT )
FROM Table1;
You can use a mix of object_construct() and flatten() to move the column names into rows. Then do the math for the values missing:
create or replace temp table many_cols as
select 1 a, 2 b, 3 c, 4 d
union all select 1, null, 3, 4
union all select 8, 8, null, null
union all select 8, 8, 7, null
union all select null, null, null, null;
select key column_name
, 1-count(*)/(select count(*) from many_cols) ratio_null
from (
select object_construct(a.*) x
from many_cols a
), lateral flatten(x)
group by key
;
You can do this using a SQL generator if you don't mind copying the text and running it once it's done.
-- SQL generator option:
select 'select' || listagg(' ((select count(' || COLUMN_NAME || ') from "SNOWFLAKE_SAMPLE_DATA"."TPCH_SF10000"."ORDERS") / ' ||
'(select count(*) from "SNOWFLAKE_SAMPLE_DATA"."TPCH_SF10000"."ORDERS")) as ' || COLUMN_NAME, ',') as SQL_STATEMENT
from "SNOWFLAKE_SAMPLE_DATA"."INFORMATION_SCHEMA"."COLUMNS"
where TABLE_CATALOG = 'SNOWFLAKE_SAMPLE_DATA' and TABLE_SCHEMA = 'TPCH_SF10000' and TABLE_NAME = 'ORDERS'
;
If the copy and paste is not plausible because you need to script it, you can use the results of the SQL generator in a stored procedure I wrote to execute a single line of dynamic SQL:
call run_dynamic_sql(
select 'select' || listagg(' ((select count(' || COLUMN_NAME || ') from "SNOWFLAKE_SAMPLE_DATA"."TPCH_SF10000"."ORDERS") / ' ||
'(select count(*) from "SNOWFLAKE_SAMPLE_DATA"."TPCH_SF10000"."ORDERS")) as ' || COLUMN_NAME, ',') as SQL_STATEMENT
from "SNOWFLAKE_SAMPLE_DATA"."INFORMATION_SCHEMA"."COLUMNS"
where TABLE_CATALOG = 'SNOWFLAKE_SAMPLE_DATA' and TABLE_SCHEMA = 'TPCH_SF10000' and TABLE_NAME = 'ORDERS'
);
If you want the stored procedure, until it's published on Snowflake's blog it's available here: https://snowflake.pavlik.us/index.php/2021/01/22/running-dynamic-sql-in-snowflake/

How can i get a count(*) of all the columns in a table? Using PostgreSql

I have bunch of tables where several of them have hundreds of columns. I need to get a count of non-null values for each column and I've been doing it manually. I would like to figure out a way to get all the counts for all the columns in a table. I looked up stackoverflow and google, but unable to find the answer.
I tried this but it's just returning a value of 1 for each column. I know it's just counting the number of column and not the values in each column. Any suggestions?
select count(COLUMN_NAME)
from information_schema.columns
where table_schema = 'schema_name'
and table_name = 'table_name'
group by COLUMN_NAME
COUNT(column_name) always gives you the count of NON NULL values.
Create a generic function like this which can take schema name and table name as arguments.
Here I am constructing select statements joined together by UNION ALLs each returning the value of the column_name and it's count for all columns when executed dynamically.
CREATE OR REPLACE FUNCTION public.get_count( TEXT, TEXT )
RETURNS TABLE(t_column_name TEXT, t_count BIGINT )
LANGUAGE plpgsql
AS $BODY$
DECLARE
p_schema TEXT := $1;
p_tabname TEXT := $2;
v_sql_statement TEXT;
BEGIN
SELECT STRING_AGG( 'SELECT '''
|| column_name
|| ''','
|| ' count('
|| column_name
|| ') FROM '
|| table_schema
|| '.'
|| table_name
,' UNION ALL ' ) INTO v_sql_statement
FROM information_schema.columns
WHERE table_schema = p_schema
AND table_name = p_tabname;
IF v_sql_statement IS NOT NULL THEN
RETURN QUERY EXECUTE v_sql_statement;
END IF;
END
$BODY$;
Execution
knayak=# select c.col, c.count from
public.get_count( 'public', 'employees' ) as c(col,count);
col | count
----------------+-------
employee_id | 107
first_name | 107
last_name | 107
email | 107
phone_number | 107
hire_date | 107
job_id | 107
salary | 107
commission_pct | 35
manager_id | 106
department_id | 106
(11 rows)
There's not really a magic way to do this. If you need to check each of 100 different columns to see how many non-null values there are, you'll have to specify each of the columns of the table.
About the best you can do is use the system catalogs to help write your queries:
select 'SUM(CASE WHEN ' + column_name + ' IS NULL THEN 1 ELSE 0 END) AS ' + column_name
from information_schema.columns
where table_schema = 'schema_name'
and table_name = 'table_name'
and is_nullable = 'YES'
You may need to add quoted identifiers if you've got spaces or other special characters in your column names.
Then you can copy that output to another query and add the missing parts of the query. I've added and is_nullable = 'YES' because it's a waste of time to check NOT NULL columns. As far as I know, that column is present in PostgreSQL.
Try this
SELECT
sum(case when column1 is not null then 1 else 0 end) as col1_not_null_count
sum(case when column2 is not null then 1 else 0 end) as col2_not_null_count
sum(case when column3 is not null then 1 else 0 end) as col3_not_null_count
FROM information_schema.columns
WHERE table_schema = 'schema_name'
AND table_name = 'table_name'
The best way I've found to do this is to write a case statement to make a non-null value in a column become a 1 and a null become a 0. Then I sum the case to get a count of the non-null values:
SELECT SUM(CASE WHEN COLUMN_NAME1 IS NULL THEN 1 ELSE 0 END) AS COL1_COUNT
, SUM(CASE WHEN COLUMN_NAME2 IS NULL THEN 1 ELSE 0 END) AS COL2_COUNT
FROM TABLE_NAME
I see in your select that you are looking at the information_schema.columns table. You can dynamically generate the code above by a select from that table:
SELECT ', SUM(CASE WHEN ' + column_name + ' IS NULL THEN 1 ELSE 0 END) AS ' + column_name + '_COUNT'
FROM information_schema.columns
WHERE table_schema = 'schema_name'
AND table_name = 'table_name'
You can also dynamically create a different select for every column in the table in question:
SELECT 'SELECT SUM(CASE WHEN ' + column_name + ' IS NULL THEN 1 ELSE 0 END) AS ' + column_name + '_COUNT FROM ' + table_schema + '.' + table_name
FROM information_schema.columns
WHERE table_schema = 'schema_name'
AND table_name = 'table_name'
I came here looking to answer the same question, but I wanted to do this without making a function, as I don't always have the ability to do that in the databases I'm working with. But the databases I work with have the tblfunc module installed, which has the crosstab function, that takes an sql string as input. I was able to produce sql that I could use in that function to get all the column counts from any table. Here is the code I used, where I put in the schema_name and table_name of what I wanted:
select * from crosstab('select column_name, ''num'' as category, sum(num) from (' || (
select
string_agg(sql, '') as sql_string
from (
select
case when row_number() OVER () = 1 then '' else ' union all ' end ||
'select ''' || column_name || ''' as column_name, ' || 'count(' || column_name || ') as num from ' || table_schema || '.' || table_name as sql
from information_schema.columns
where table_schema = 'schema_name'
and table_name = 'table_name') as sql_query limit 1
) || ') as column_counts group by column_name, category')
AS t(column_name text, num numeric) order by num asc

Oracle get table names based on column value

I have table like this:
Table-1
Table-2
Table-3
Table-4
Table-5
each table is having many columns and one of the column name is employee_id.
Now, I want to write a query which will
1) return all the tables which is having this columns and
2) results should show the tables if the column is having values or empty values by passing employee_id.
e.g. show table name, column name from Table-1, Table-2,Table-3,... where employee_id='1234'.
If one of the table doesn't have this column, then it is not required to show.
I have verified with link, but it shows only table name and column name and not by passing some column values to it.
Also verified this, but here verifies from entire schema which I dont want to do it.
UPDATE:
Found a solution, but by using xmlsequence which is deprecated,
1)how do I make this code as xmltable?
2) If there are no values in the table, then output should have empty/null. or default as "YES" value
WITH char_cols AS
(SELECT /*+materialize */ table_name, column_name
FROM cols
WHERE data_type IN ('CHAR', 'VARCHAR2') and table_name in ('Table-1','Table-2','Table-3','Table-4','Table-5'))
SELECT DISTINCT SUBSTR (:val, 1, 11) "Employee_ID",
SUBSTR (table_name, 1, 14) "Table",
SUBSTR (column_name, 1, 14) "Column"
FROM char_cols,
TABLE (xmlsequence (dbms_xmlgen.getxmltype ('select "'
|| column_name
|| '" from "'
|| table_name
|| '" where upper("'
|| column_name
|| '") like upper(''%'
|| :val
|| '%'')' ).extract ('ROWSET/ROW/*') ) ) t ORDER BY "Table"
/
This query can be done in one step using the (non-deprecated) XMLTABLE.
Sample Schema
--Table-1 and Table-2 match the criteria.
--Table-3 has the right column but not the right value.
--Table-4 does not have the right column.
create table "Table-1" as select '1234' employee_id from dual;
create table "Table-2" as select '1234' employee_id from dual;
create table "Table-3" as select '4321' employee_id from dual;
create table "Table-4" as select 1 id from dual;
Query
--All tables with the column EMPLOYEE_ID, and the number of rows where EMPLOYEE_ID = '1234'.
select table_name, total
from
(
--Get XML results of dynamic query on relevant tables and columns.
select
dbms_xmlgen.getXMLType(
(
--Create a SELECT statement on each table, UNION ALL'ed together.
select listagg(
'select '''||table_name||''' table_name, count(*) total
from "'||table_name||'" where employee_id = ''1234'''
,' union all'||chr(10)) within group (order by table_name) v_sql
from user_tab_columns
where column_name = 'EMPLOYEE_ID'
)
) xml
from dual
) x
cross join
--Convert the XML data to relational.
xmltable('/ROWSET/ROW'
passing x.xml
columns
table_name varchar2(128) path 'TABLE_NAME',
total number path 'TOTAL'
);
Results
TABLE_NAME TOTAL
---------- -----
Table-1 1
Table-2 1
Table-3 0
Just try to use code below.
Pay your attention that may be nessecery clarify scheme name in loop.
This code works for my local db.
set serveroutput on;
DECLARE
ex_query VARCHAR(300);
num NUMBER;
emp_id number;
BEGIN
emp_id := <put your value>;
FOR rec IN
(SELECT table_name
FROM all_tab_columns
WHERE column_name LIKE upper('employee_id')
)
LOOP
num :=0;
ex_query := 'select count(*) from ' || rec.table_name || ' where employee_id = ' || emp_id;
EXECUTE IMMEDIATE ex_query into num;
if (num>0) then
DBMS_OUTPUT.PUT_LINE(rec.table_name);
end if;
END LOOP;
END;
I tried with the xml thing, but I get an error I cannot solve. Something about a zero size result. How difficult is it to solve this instead of raising exception?! Ask Oracle.
Anyway.
What you can do is use the COLS table to know what table has the employee_id column.
1) what table from table TABLE_LIKE_THIS (I assume column with table names is C) has this column?
select *
from COLS, TABLE_LIKE_THIS t
where cols.table_name = t
and cols.column_name = 'EMPLOYEE_ID'
-- think Oracle metadata/ think upper case
2) Which one has the value you are looking for: write a little chunk of Dynamic PL/SQL with EXECUTE IMMEDIATE to count the tables matching above condition
declare
v_id varchar2(10) := 'JP1829'; -- value you are looking for
v_col varchar2(20) := 'EMPLOYEE_ID'; -- column
n_c number := 0;
begin
for x in (
select table_name
from all_tab_columns cols
, TABLE_LIKE_THIS t
where cols.table_name = t.c
and cols.column_name = v_col
) loop
EXECUTE IMMEDIATE
'select count(1) from '||x.table_name
||' where Nvl('||v_col||', ''##'') = ''' ||v_id||'''' -- adding quotes around string is a little specific
INTO n_c;
if n_c > 0 then
dbms_output.put_line(n_C|| ' in ' ||x.table_name||' has '||v_col||'='||v_id);
end if;
-- idem for null values
-- ... ||' where '||v_col||' is null '
-- or
-- ... ||' where Nvl('||v_col||', ''##'') = ''##'' '
end loop;
dbms_output.put_line('done.');
end;
/
Hope this helps

Oracle: Count non-null fields for each column in a table

I need a query to count the total number of non-null values for each column in a table. Since my table has hundreds of columns I'm looking for a solution that only requires me to input the table name.
Perhaps using the result of:
select COLUMN_NAME from ALL_TAB_COLUMNS where TABLE_NAME='ORDERS';
to get the column names and then a subquery to put counts against each column name? The additional complication is that I only have read-only access to the DB so I can't create any temp tables.
Slightly out of my league with this one so any help is appreciated.
Construct the query in SQL or using a spreadsheet. Then run the query.
For instance, assuming that your column names are simple and don't have special characters:
select replace('select ''[col]'', count([col]) from orders union all ',
'[col]', COLUMN_NAME
) as sql
from ALL_TAB_COLUMNS
where TABLE_NAME = 'ORDERS';
(Of course, this can be adapted for more complex column names, but I'm trying to show the idea.)
Then copy the code, remove the final union all and run it.
You can put this in one string if there are not too many columns:
select listagg(replace('select ''[col]'', count([col]) from orders',
'[col]', COLUMN_NAME
), ' union all '
) within group (order by column_name) as sql
from ALL_TAB_COLUMNS
where TABLE_NAME = 'ORDERS';
You can also use execute immediate using the same query, but that seems like overkill.
If you're happy with the results row-ar rather than column-ar:
SELECT 'SELECT ''dummy'', 0 FROM DUAL' FROM DUAL
UNION ALL
SELECT
' UNION ALL SELECT ''' ||
column_name ||
''', COUNT(' ||
column_name ||
') FROM ' ||
TABLE_NAME
FROM
all_tab_columns
WHERE
table_name = 'ORDERS'
This is an "SQL that writes an SQL" that you can then copy and run to get your answers. Should make a resultset that looks like:
SELECT 'dummy', 0 FROM dual
UNION ALL SELECT 'col1', COUNT(col1) FROM ORDERS
UNION ALL SELECT 'col2', COUNT(col2) FROM ORDERS
...
If you want your results column-ar:
SELECT 'SELECT '
UNION ALL
SELECT
'COUNT(' ||
column_name ||
') as count_' ||
column_name ||
', ' ||
TABLE_NAME
FROM
all_tab_columns
WHERE
table_name = 'ORDERS'
UNION ALL
SELECT 'null as dummy_column FROM ORDERS'
Should make a resultset that looks like:
SELECT
COUNT(col1) as count_col1,
COUNT(col2) as count_col2,
...
null as dummycoll FROM orders
Caveat: I don't have oracle installed anywhere I can test these, it's written from memory and may need some debugging
This will generate the SQL to get the counts in columns and will handle case sensitive column names and column names with non-alpha-numeric characters:
SELECT 'SELECT '
|| LISTAGG(
'COUNT("' || column_name || '") AS "' || column_name || '"',
', '
) WITHIN GROUP ( ORDER BY column_id )
|| ' FROM "' || table_name || '"' AS sql
FROM ALL_TAB_COLUMNS
WHERE TABLE_NAME = 'ORDERS'
GROUP BY TABLE_NAME;
or, if you have a large number of columns that is generating a string longer than 4000 characters you can use a custom aggregation function to aggregate VARCHAR2s into a CLOB and then do:
SELECT 'SELECT '
|| CLOBAgg( 'COUNT("' || column_name || '") AS "' || column_name || '"' )
|| ' FROM "' || table_name || '"' AS sql
FROM ALL_TAB_COLUMNS
WHERE TABLE_NAME = 'ORDERS'
GROUP BY TABLE_NAME;
In Oracle 19 (I used similar code in Ora 12, maybe that works too), this works without generating another select to execute:
select * from
(
select table_name, column_name,
to_number( extractvalue( xmltype(dbms_xmlgen.getxml('select count(to_char(substr('||column_name||',1,1))) c from '||table_name)) ,'/ROWSET/ROW/C')) count
from all_tab_columns where owner = user
)
--where table_name = 'MY_TABLE'
;
It will create XML with count, from which it extracts the current count. The substr and to_char functions here are used to extract first character, so it will works with CLOB columns also

Comparing column by column between two rows in Oracle DB

I need to write a query to compare column by column (ie: find differences) between two rows in the database. For example:
row1: 10 40 sometext 24
row2: 10 25 sometext 24
After the query executed, it should shows only the fields that have difference (ie: the second field)
Here's what I have done so far:
select table1.column1, table1.column2, table1.column3, table1.column4
from table1
where somefield in (field1, field2);
The above query will show me two rows one above another like this:
10 40 sometext 24
10 25 sometext 24
Then I have to manually do the comparison and it takes a lot of time b/c the row contains a lot of column.
So again my question is: How can I write a query that will show me only the columns that have differences??
Thanks
Use UNPIVOT clause (see http://www.oracle-developer.net/display.php?id=506) to turn columns into rows, then filter out the same rows (using GROUP BY HAVING COUNT and finally use PIVOT to get rows with different columns only.
To do this easily you need to query the metadata for the table to get each row. You can use the following code as a script.
Replace the define table_name with your table name and define yes_drop_it = NO. Put your raw WHERE syntax into the where_clause. The comparison logic always compares the first two rows returned for the where clause.
whenever sqlerror exit failure rollback;
set linesize 150
define test_tab_name = tst_cf_cols
define yes_drop_it = YES
define order_by = 1, 2
define where_clause = 1 = 1
define tab_owner = user
<<clearfirst>> begin
for clearout in (
select 'drop table ' || table_name as cmd
from all_tables
where owner = &&tab_owner and table_name = upper('&&test_tab_name')
and '&&yes_drop_it' = 'YES'
) loop
execute immediate clearout.cmd;
execute immediate '
create table &&test_tab_name as
select 10 as column1, 40 as column2, ''sometext'' as column3, 24 as column4 from dual
union all
select 10 as column1, 25 as column2, ''sometext'' as column3, 24 as column4 from dual
';
end loop;
end;
/
column cfsynt format a4000 word_wrap new_value comparison_syntax
with parms as (select 'parmquery' as cte_name, 'row_a' as corr_name_1, 'row_b' as corr_name_2 from dual)
select
'select * from (select ' || LISTAGG(cfcol || ' AS cf_' || trim (to_char (column_id, '000')) || '_' || column_name
, chr(13) || ', ') WITHIN GROUP (order by column_id)
|| chr(13) || ' from (select * from parmquery where row_number = 1) ' || corr_name_1
|| chr(13) || ', (select * from parmquery where row_number = 2) ' || corr_name_2
|| chr(13) || ') where ''DIFFERENT'' IN (' || LISTAGG ('cf_' || trim (to_char (column_id, '000')) || '_' || column_name, chr(13) || ', ') within group (order by column_id) || ')'
as cfsynt
from parms, (
select
'decode (' || corr_name_1 || '.' || column_name || ', ' || corr_name_2
|| '.' || column_name || ', ''SAME'', ''DIFFERENT'')'
as cfcol,
column_name,
column_id
from
parms,
all_tab_columns
where
owner = &&tab_owner and table_name = upper ('&&test_tab_name')
);
with parmquery as (select rownum as row_number, vals.* from (
select * from &&test_tab_name
where &&where_clause
order by &&order_by
) vals
) &&comparison_syntax
;