Rank columns by their count of values - sql

I have a table with a bunch of boolean columns. I'd like to rank these columns by the count of true values each one has.
I found a way to count the number of true values in a column using:
SELECT count(CASE WHEN col1 THEN 1 ELSE null END) as col1,
count(CASE WHEN col2 THEN 1 ELSE null END) as col2
....
FROM my_table;
but this approach has two problems:
I have to manually type the names of the columns
I have to then transpose the result and order by value
Is there a way to do the whole operation one query?

This is not actually a crosstab job (or "pivot" in other RDBMS), but the reverse operation, "unpivot" if you will. One elegant technique is a VALUES expression in a LATERAL join.
The basic query can look like this, which takes care of:
I have to then transpose the result and order by value
SELECT c.col, c.ct
FROM (
SELECT count(col1 OR NULL) AS col1
, count(col2 OR NULL) AS col2
-- etc.
FROM tbl
) t
, LATERAL (
VALUES
('col1', col1)
, ('col2', col2)
-- etc.
) c(col, ct)
ORDER BY 2;
That was the simple part. Your other request is harder:
I have to manually type the names of the columns
This function takes your table name and retrieves meta data from the system catalog pg_attribute. It's a dynamic implementation of the above query, safe against SQL injection:
CREATE OR REPLACE FUNCTION f_true_ct(_tbl regclass)
RETURNS TABLE (col text, ct bigint)
LANGUAGE plpgsql AS
$func$
BEGIN
RETURN QUERY EXECUTE (
SELECT format('
SELECT c.col, c.ct
FROM (SELECT %s FROM tbl) t
, LATERAL (VALUES %s) c(col, ct)
ORDER BY 2 DESC'
, string_agg (format('count(%1$I OR NULL) AS %1$I', attname), ', ')
, string_agg (format('(%1$L, %1$I)', attname), ', ')
)
FROM pg_attribute
WHERE attrelid = _tbl -- valid, visible, legal table name
AND attnum >= 1 -- exclude tableoid & friends
AND NOT attisdropped -- exclude dropped columns
AND atttypid = 'bool'::regtype -- only character types
);
END
$func$;
Call:
SELECT * FROM f_true_ct('tbl'); -- table name optionally schema-qualified
Result:
col | ct
------+---
col1 | 3
col3 | 2
col2 | 1
Works for any table to rank all boolean columns by their count of true values.
To understand the function parameter, read this:
Table name as a PostgreSQL function parameter
Related answers with more explanation:
Check whether empty strings are present in character-type columns
Replace empty strings with null values

If I understand correctly, you can do this with a giant union all:
select c.*
from ((select 'col1' as which, sum(case when col1 then 1 else 0 end) as cnt from t
) union all
(select 'col2' as which, sum(case when col2 then 1 else 0 end) as cnt from t
) union all
. . .
) c
order by cnt desc;
Although you still need to type the results, this does sidestep the transpositions.

Related

Count value across multiple columns

I am looking to count the number of times set of values occurred in a table. These values could occur in up to 10 different columns. I need to increment the count regardless of which column it is in. I know how I could count if they were all in the same column but not spanning multiple columns.
Values can be added in any order. I have about a thousand
Cpt1 Cpt2 Cpt3 Cpt4 Cpt5
63047 63048 63048 NULL NULL
I would want to for this row I'd expect this as the result
63047 1
63048 2
You could use a union all call to treat them as one column:
SELECT col, COUNT(*)
FROM (SELECT col1 FROM mytable
UNION ALL
SELECT col2 FROM mytable
UNION ALL
SELECT col3 FROM mytable
-- etc...
) t
GROUP BY col
It's not entirely clear what your table exactly looks like, but I'm guessing that what you're looking for is:
SELECT row_count = COUNT(*),
row_count_with_given_value = SUM ( CASE WHEN field1 = 'myValue' THEN 1
WHEN field2 = 'myValue' THEN 1
WHEN field3 = 'myValue' THEN 1
WHEN field4 = 'myValue' THEN 1 ELSE 0 END)
FROM myTable
Assuming the fieldx columns are not NULL-able, you could write it like this too:
SELECT row_count = COUNT(*),
row_count_with_given_value = SUM ( CASE WHEN 'myValue' IN (field1, field2, field3, field4) THEN 1 ELSE 0 END)
FROM myTable
Something like this might work (after adapting to your value domain and data types):
create table t1
(i1 int,
i2 int,
i3 int);
insert into t1 values (1,0,0);
insert into t1 values (1,1,1);
insert into t1 values (1,0,0);
declare #i int = 0;
select #i = #i + i1 + i2 + i3 from t1;
print #i;
drop table t1;
Output is: 5
Many databases support lateral joins, of one type of another. These can be used to simplify this operation. Using the SQL Server/Oracle 12C syntax:
select v.cpt, count(*)
from t cross apply
(values (cpt1), (cpt2), . . .
) v(cpt)
where cpt is not null
group by v.cpt;

How to find count of multiple columns in sql group by id?

I have a table with schema like below:
root
|id
|name
|col1
|col2
|...
|col30
Conditions are that multiple rows can have the same name (they're not primary key - the key is the ID). Values in col1-col30 will be some string, or it can have the string "null".
I'm interested in the number of columns filled in for each name.
For example,
if name "test1" has col1-5 filled in a row, and another row has "test1" and have col1, 3, 10, 6 filled in (and the rest of unfilled columns are just string value "null"), "test1" should have value 9.
I'm pretty new to SQL and have been looking this up.. Please help.
Give this a try:
SELECT
name,
CASE WHEN col1_max IS NOT NULL THEN 1 ELSE 0 END + -- Only include non-NULL values
CASE WHEN col2_max IS NOT NULL THEN 1 ELSE 0 END
FROM (
SELECT
name,
MAX(col1) AS col1_max, -- Non-NULL values come before NULL
MAX(col2) AS col2_max
FROM MyTable
GROUP BY name
) src
You can add more the rest of the columns to fit your case.
Updated
I just realized your NULL case is with a "null" string. Modified:
SELECT
name,
CASE WHEN col1_max IS NOT NULL THEN 1 ELSE 0 END + -- Only include non-NULL values
CASE WHEN col2_max IS NOT NULL THEN 1 ELSE 0 END
FROM (
SELECT
name,
MAX(CASE WHEN col1 = 'null' THEN NULL ELSE col1 END) AS col1_max, -- Non-NULL values come before NULL
MAX(CASE WHEN col2 = 'null' THEN NULL ELSE col2 END) AS col2_max
FROM MyTable
GROUP BY name
) src
First you unpivot your table and count those rows that have not null values. In postgres, you can achieve this with unnest. I have only used col1..7 -- change to upto col30 in your case
WITH t AS(
SELECT id,name,
unnest(array['col1', 'col2', 'col3', 'col4', 'col5', 'col6', 'col7']) AS colname,
unnest(array[col1, col2, col3, col4, col5, col6, col7]) AS colvalue
FROM your_table)
SELECT id, name,
SUM(CASE WHEN colvalue IS NULL THEN 0 ELSE 1 END) AS count_filled
FROM t
GROUP BY 1,2;

How can I acces the output from the first select statement

I have a table Like this
Col1 | Col2
-----------
a | d
b | e
c | a
Now I want to create an statement to get an output like this:
First| Second
-------------------
a | Amsterdamm
b | Berlin
c | Canada
...
So far I have this consturct what is not working
SELECT *
FROM(
SELECT DISTINCT
CASE
when Col1 IS NULL then 'NA'
else Col1
END
FROM Table1
UNION
SELECT DISTINCT
CASE
when Col2 IS NULL then 'NA'
else Col2
END
FROM Table1
) AS First
,
(
SELECT DISTINCT
when First= 'a' then 'Amsterdam'
when First= 'b' then 'Berlin'
when First= 'c' then 'Canada'
) AS Second
;
can you help me with that
Sorry I have to edit my question to be more specific.
Not as familiar with DB2... I'll lookup if it has a concat function in a sec... and it does.
SELECT First, case when first = 'a' then
concat('This is a ',first)
case when first = 'b' then
concat('To Be or not to ',first)
case else
concat('This is a ',first) end as Second
FROM (
SELECT coalesce(col1, 'NA') as First
FROM Table
UNION
SELECT coalesce(col2, 'NA')
FROM table) SRC
WHERE first <> 'NA'
What this does is generate a single inline view called src with a column called first. If col1 or col2 of table are null then it substitutes NA for that value. It then concatenates first and the desired text excluding records with a first value of 'NA'
Or if you just create an inline table with the desired values and join in...
SELECT First, x.b as Second
FROM (
SELECT coalesce(col1, 'NA') as First
FROM Table
UNION
SELECT coalesce(col2, 'NA')
FROM table) SRC
INNER JOIN (select a,b
from (values ('a', 'This is a'),
('b', 'To B or not to' ),
('c', 'I like cat whose name starts with')) as x(a,b)) X;
on X.a = src.first
WHERE first <> 'NA'
Personally I find the 2nd option easier to read. Though if you have meaning for a,b,c I would think you'd want that stored in a table somewhere for additional access. In code seems like a bad place to store data like this that could change.
Assuming you want
a this is a a
b this is a b
c this is a c
d this is a d
e this is a e
thanks to xQbert
I could solve this problem like this
SELECT FirstRow, concat
(
CASE FirstRow
WHEN 'AN' then 'amerstdam'
WHEN 'G' then 'berlin'
ELSE 'NA'
END, ''
) AS SecondRow
FROM(
Select coalesce (Col1, 'NA') as FirstRow
FROM Table1
UNION
Select coalesce (Col2, 'NA')
FROM Table1) SRC
WHERE FirstRow <> 'NA'
;

Error handling or Exception handling in Netezza

I'm running a netezza sql process as part of a shell script and in one of the sql codes, I want it to raise an ERROR or exception if the number of rows from 2 different tables don't match.
SQL Code:
/* The following 2 tables should return the same number of rows to make sure the process is correct */
select count(*)
from (
select distinct col1, col2,col3
from table_a
where week > 0 and rec >= 1
) as x ;
select count(*)
from (
select distinct col1, col2, col3
from table_b
) as y ;
How do I compare the 2 row counts and raise an exception/ERROR in the netezza SQL process, so that it exits the process, if the 2 row counts aren't equal ?
I agree a script is the best option. However you could still do the check in your SQL itself by using a cross join
Select a.*
from Next_Step_table a cross join
(select case when y.y_cnt is null then 'No Match' else 'Match' end as match
from (select count(*) as x_cnt
from ( select distinct col1, col2,col3
from table_a
where week > 0 and rec >= 1
)) x left outer join
(select count(*) as y_cnt
from (select distinct col1, col2, col3
from table_b
)) y on x.x_cnt=y.y_cnt) match_tbl
where match_tbl.match='Match'
i'm guessing the best solution here is to do it in the script.
i.e store the result of count(*) in variables, then compare them. nzsql has command line options to only return the result data of a single query.
If it must be done in plain SQL, a horribly, horrible kludge that will work is to use divide-by-zero. It's ugly but I've used it before when testing stuff. off the top of my head:
with
subq_x as select count(*) c1 .... ,
subq_y as select count(*) c2 ...
select (case when (subq_x.c1 != subq_y.c1) then 1/0 else 1 end) counts_match;
Did I mention this is ugly ?

Counting null and non-null values in a single query

I have a table
create table us
(
a number
);
Now I have data like:
a
1
2
3
4
null
null
null
8
9
Now I need a single query to count null and not null values in column a
This works for Oracle and SQL Server (you might be able to get it to work on another RDBMS):
select sum(case when a is null then 1 else 0 end) count_nulls
, count(a) count_not_nulls
from us;
Or:
select count(*) - count(a), count(a) from us;
If I understood correctly you want to count all NULL and all NOT NULL in a column...
If that is correct:
SELECT count(*) FROM us WHERE a IS NULL
UNION ALL
SELECT count(*) FROM us WHERE a IS NOT NULL
Edited to have the full query, after reading the comments :]
SELECT COUNT(*), 'null_tally' AS narrative
FROM us
WHERE a IS NULL
UNION
SELECT COUNT(*), 'not_null_tally' AS narrative
FROM us
WHERE a IS NOT NULL;
Here is a quick and dirty version that works on Oracle :
select sum(case a when null then 1 else 0) "Null values",
sum(case a when null then 0 else 1) "Non-null values"
from us
for non nulls
select count(a)
from us
for nulls
select count(*)
from us
minus
select count(a)
from us
Hence
SELECT COUNT(A) NOT_NULLS
FROM US
UNION
SELECT COUNT(*) - COUNT(A) NULLS
FROM US
ought to do the job
Better in that the column titles come out correct.
SELECT COUNT(A) NOT_NULL, COUNT(*) - COUNT(A) NULLS
FROM US
In some testing on my system, it costs a full table scan.
As i understood your query, You just run this script and get Total Null,Total NotNull rows,
select count(*) - count(a) as 'Null', count(a) as 'Not Null' from us;
usually i use this trick
select sum(case when a is null then 0 else 1 end) as count_notnull,
sum(case when a is null then 1 else 0 end) as count_null
from tab
group by a
Just to provide yet another alternative, Postgres 9.4+ allows applying a FILTER to aggregates:
SELECT
COUNT(*) FILTER (WHERE a IS NULL) count_nulls,
COUNT(*) FILTER (WHERE a IS NOT NULL) count_not_nulls
FROM us;
SQLFiddle: http://sqlfiddle.com/#!17/80a24/5
This is little tricky. Assume the table has just one column, then the Count(1) and Count(*) will give different values.
set nocount on
declare #table1 table (empid int)
insert #table1 values (1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(NULL),(11),(12),(NULL),(13),(14);
select * from #table1
select COUNT(1) as "COUNT(1)" from #table1
select COUNT(empid) "Count(empid)" from #table1
Query Results
As you can see in the image, The first result shows the table has 16 rows. out of which two rows are NULL. So when we use Count(*) the query engine counts the number of rows, So we got count result as 16. But in case of Count(empid) it counted the non-NULL-values in the column empid. So we got the result as 14.
so whenever we are using COUNT(Column) make sure we take care of NULL values as shown below.
select COUNT(isnull(empid,1)) from #table1
will count both NULL and Non-NULL values.
Note: Same thing applies even when the table is made up of more than one column. Count(1) will give total number of rows irrespective of NULL/Non-NULL values. Only when the column values are counted using Count(Column) we need to take care of NULL values.
I had a similar issue: to count all distinct values, counting null values as 1, too. A simple count doesn't work in this case, as it does not take null values into account.
Here's a snippet that works on SQL and does not involve selection of new values.
Basically, once performed the distinct, also return the row number in a new column (n) using the row_number() function, then perform a count on that column:
SELECT COUNT(n)
FROM (
SELECT *, row_number() OVER (ORDER BY [MyColumn] ASC) n
FROM (
SELECT DISTINCT [MyColumn]
FROM [MyTable]
) items
) distinctItems
Try this..
SELECT CASE
WHEN a IS NULL THEN 'Null'
ELSE 'Not Null'
END a,
Count(1)
FROM us
GROUP BY CASE
WHEN a IS NULL THEN 'Null'
ELSE 'Not Null'
END
Here are two solutions:
Select count(columnname) as countofNotNulls, count(isnull(columnname,1))-count(columnname) AS Countofnulls from table name
OR
Select count(columnname) as countofNotNulls, count(*)-count(columnname) AS Countofnulls from table name
Try
SELECT
SUM(ISNULL(a)) AS all_null,
SUM(!ISNULL(a)) AS all_not_null
FROM us;
Simple!
If you're using MS Sql Server...
SELECT COUNT(0) AS 'Null_ColumnA_Records',
(
SELECT COUNT(0)
FROM your_table
WHERE ColumnA IS NOT NULL
) AS 'NOT_Null_ColumnA_Records'
FROM your_table
WHERE ColumnA IS NULL;
I don't recomend you doing this... but here you have it (in the same table as result)
use ISNULL embedded function.
All the answers are either wrong or extremely out of date.
The simple and correct way of doing this query is using COUNT_IF function.
SELECT
COUNT_IF(a IS NULL) AS nulls,
COUNT_IF(a IS NOT NULL) AS not_nulls
FROM
us
SELECT SUM(NULLs) AS 'NULLS', SUM(NOTNULLs) AS 'NOTNULLs' FROM
(select count(*) AS 'NULLs', 0 as 'NOTNULLs' FROM us WHERE a is null
UNION select 0 as 'NULLs', count(*) AS 'NOTNULLs' FROM us WHERE a is not null) AS x
It's fugly, but it will return a single record with 2 cols indicating the count of nulls vs non nulls.
This works in T-SQL. If you're just counting the number of something and you want to include the nulls, use COALESCE instead of case.
IF OBJECT_ID('tempdb..#us') IS NOT NULL
DROP TABLE #us
CREATE TABLE #us
(
a INT NULL
);
INSERT INTO #us VALUES (1),(2),(3),(4),(NULL),(NULL),(NULL),(8),(9)
SELECT * FROM #us
SELECT CASE WHEN a IS NULL THEN 'NULL' ELSE 'NON-NULL' END AS 'NULL?',
COUNT(CASE WHEN a IS NULL THEN 'NULL' ELSE 'NON-NULL' END) AS 'Count'
FROM #us
GROUP BY CASE WHEN a IS NULL THEN 'NULL' ELSE 'NON-NULL' END
SELECT COALESCE(CAST(a AS NVARCHAR),'NULL') AS a,
COUNT(COALESCE(CAST(a AS NVARCHAR),'NULL')) AS 'Count'
FROM #us
GROUP BY COALESCE(CAST(a AS NVARCHAR),'NULL')
Building off of Alberto, I added the rollup.
SELECT [Narrative] = CASE
WHEN [Narrative] IS NULL THEN 'count_total' ELSE [Narrative] END
,[Count]=SUM([Count]) FROM (SELECT COUNT(*) [Count], 'count_nulls' AS [Narrative]
FROM [CrmDW].[CRM].[User]
WHERE [EmployeeID] IS NULL
UNION
SELECT COUNT(*), 'count_not_nulls ' AS narrative
FROM [CrmDW].[CRM].[User]
WHERE [EmployeeID] IS NOT NULL) S
GROUP BY [Narrative] WITH CUBE;
SELECT
ALL_VALUES
,COUNT(ALL_VALUES)
FROM(
SELECT
NVL2(A,'NOT NULL','NULL') AS ALL_VALUES
,NVL(A,0)
FROM US
)
GROUP BY ALL_VALUES
select count(isnull(NullableColumn,-1))
if its mysql, you can try something like this.
select
(select count(*) from TABLENAME WHERE a = 'null') as total_null,
(select count(*) from TABLENAME WHERE a != 'null') as total_not_null
FROM TABLENAME
Just in case you wanted it in a single record:
select
(select count(*) from tbl where colName is null) Nulls,
(select count(*) from tbl where colName is not null) NonNulls
;-)
for counting not null values
select count(*) from us where a is not null;
for counting null values
select count(*) from us where a is null;
I created the table in postgres 10 and both of the following worked:
select count(*) from us
and
select count(a is null) from us
In my case I wanted the "null distribution" amongst multiple columns:
SELECT
(CASE WHEN a IS NULL THEN 'NULL' ELSE 'NOT-NULL' END) AS a_null,
(CASE WHEN b IS NULL THEN 'NULL' ELSE 'NOT-NULL' END) AS b_null,
(CASE WHEN c IS NULL THEN 'NULL' ELSE 'NOT-NULL' END) AS c_null,
...
count(*)
FROM us
GROUP BY 1, 2, 3,...
ORDER BY 1, 2, 3,...
As per the '...' it is easily extendable to more columns, as many as needed
Number of elements where a is null:
select count(a) from us where a is null;
Number of elements where a is not null:
select count(a) from us where a is not null;