Count the Null columns in a row in SQL - sql

I was wondering about the possibility to count the null columns of row in SQL, I have a table Customer that has nullable values, simply I want a query that return an int of the number of null columns for certain row(certain customer).

This method assigns a 1 or 0 for null columns, and adds them all together. Hopefully you don't have too many nullable columns to add up here...
SELECT
((CASE WHEN col1 IS NULL THEN 1 ELSE 0 END)
+ (CASE WHEN col2 IS NULL THEN 1 ELSE 0 END)
+ (CASE WHEN col3 IS NULL THEN 1 ELSE 0 END)
...
...
+ (CASE WHEN col10 IS NULL THEN 1 ELSE 0 END)) AS sum_of_nulls
FROM table
WHERE Customer=some_cust_id
Note, you can also do this perhaps a little more syntactically cleanly with IF() if your RDBMS supports it.
SELECT
(IF(col1 IS NULL, 1, 0)
+ IF(col2 IS NULL, 1, 0)
+ IF(col3 IS NULL, 1, 0)
...
...
+ IF(col10 IS NULL, 1, 0)) AS sum_of_nulls
FROM table
WHERE Customer=some_cust_id
I tested this pattern against a table and it appears to work properly.

My answer builds on Michael Berkowski's answer, but to avoid having to type out hundreds of column names, what I did was this:
Step 1: Get a list of all of the columns in your table
SELECT COLUMN_NAME FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = 'myTable';
Step 2: Paste the list in Notepad++ (any editor that supports regular expression replacement will work). Then use this replacement pattern
Search:
^(.*)$
Replace:
\(CASE WHEN \1 IS NULL THEN 1 ELSE 0 END\) +
Step 3: Prepend SELECT identityColumnName, and change the very last + to AS NullCount FROM myTable and optionally add an ORDER BY...
SELECT
identityColumnName,
(CASE WHEN column001 IS NULL THEN 1 ELSE 0 END) +
-- ...
(CASE WHEN column200 IS NULL THEN 1 ELSE 0 END) AS NullCount
FROM
myTable
ORDER BY
NullCount DESC

For ORACLE-DBMS only.
You can use the NVL2 function:
NVL2( string1, value_if_not_null, value_if_null )
Here is a select with a similiar approach as Michael Berkowski suggested:
SELECT (NVL2(col1, 0, 1)
+ NVL2(col2, 0, 1)
+ NVL2(col3, 0, 1)
...
...
+ NVL2(col10, 0, 1)
) AS sum_of_nulls
FROM table
WHERE Customer=some_cust_id
A more generic approach would be to write a PL/SQL-block and use dynamic SQL. You have to build a SELECT string with the NVL2 method from above for every column in the all_tab_columns of a specific table.

Unfortunately, in a standard SQL statement you will have to enter each column you want to test, to test all programatically you could use T-SQL. A word of warning though, ensure you are working with genuine NULLS, you can have blank stored values that the database will not recognise as a true NULL (I know this sounds strange).
You can avoid this by capturing the blank values and the NULLS in a statement like this:
CASE WHEN col1 & '' = '' THEN 1 ELSE 0 END
Or in some databases such as Oracle (not sure if there are any others) you would use:
CASE WHEN col1 || '' = '' THEN 1 ELSE 0 END

You don't state RDBMS. For SQL Server 2008...
SELECT CustomerId,
(SELECT COUNT(*) - COUNT(C)
FROM (VALUES(CAST(Col1 AS SQL_VARIANT)),
(Col2),
/*....*/
(Col9),
(Col10)) T(C)) AS NumberOfNulls
FROM Customer

Depending on what you want to do, and if you ignore mavens, and if you use SQL Server 2012, you could to it another way. .
The total number of candidate columns ("slots") must be known.
1. Select all the known "slots" column by column (they're known).
2. Unpivot that result to get a
table with one row per original column. This works because the null columns don't
unpivot, and you know all the column names.
3. Count(*) the result to get the number of non-nulls;
subtract from that to get your answer.
Like this, for 4 "seats" in a car
select 'empty seats' = 4 - count(*)
from
(
select carId, seat1,seat2,seat3,seat4 from cars where carId = #carId
) carSpec
unpivot (FieldValue FOR seat in ([seat1],[seat2],[seat3],[seat4])) AS results
This is useful if you may need to do more later than just count the number of non-null columns, as it gives you a way to manipulate the columns as a set too.

This will give you the number of columns which are not null. you can apply this appropriately
SELECT ISNULL(COUNT(col1),'') + ISNULL(COUNT(col2),'') +ISNULL(COUNT(col3),'')
FROM TABLENAME
WHERE ID=1

The below script gives you the NULL value count within a row i.e. how many columns do not have values.
{SELECT
*,
(SELECT COUNT(*)
FROM (VALUES (Tab.Col1)
,(Tab.Col2)
,(Tab.Col3)
,(Tab.Col4)) InnerTab(Col)
WHERE Col IS NULL) NullColumnCount
FROM (VALUES(1,2,3,4)
,(NULL,2,NULL,4)
,(1,NULL,NULL,NULL)) Tab(Col1,Col2,Col3,Col4) }
Just to demonstrate I am using an inline table in my example.
Try to cast or convert all column values to a common type it will help you to compare the column of different type.

I haven't tested it yet, but I'd try to do it using a PL\SQL function
CREATE OR REPLACE TYPE ANYARRAY AS TABLE OF ANYDATA
;
CREATE OR REPLACE Function COUNT_NULL
( ARR IN ANYARRAY )
RETURN number
IS
cnumber number ;
BEGIN
for i in 1 .. ARR.count loop
if ARR(i).column_value is null then
cnumber := cnumber + 1;
end if;
end loop;
RETURN cnumber;
EXCEPTION
WHEN OTHERS THEN
raise_application_error
(-20001,'An error was encountered - '
||SQLCODE||' -ERROR- '||SQLERRM);
END
;
Then use it in a select query like this
CREATE TABLE TEST (A NUMBER, B NUMBER, C NUMBER);
INSERT INTO TEST (NULL,NULL,NULL);
INSERT INTO TEST (1 ,NULL,NULL);
INSERT INTO TEST (1 ,2 ,NULL);
INSERT INTO TEST (1 ,2 ,3 );
SELECT ROWNUM,COUNT_NULL(A,B,C) AS NULL_COUNT FROM TEST;
Expected output
ROWNUM | NULL_COUNT
-------+-----------
1 | 3
2 | 2
3 | 1
4 | 0

This is how i tried
CREATE TABLE #temptablelocal (id int NOT NULL, column1 varchar(10) NULL, column2 varchar(10) NULL, column3 varchar(10) NULL, column4 varchar(10) NULL, column5 varchar(10) NULL, column6 varchar(10) NULL);
INSERT INTO #temptablelocal
VALUES (1,
NULL,
'a',
NULL,
'b',
NULL,
'c')
SELECT *
FROM #temptablelocal
WHERE id =1
SELECT count(1) countnull
FROM
(SELECT a.ID,
b.column_title,
column_val = CASE b.column_title
WHEN 'column1' THEN a.column1
WHEN 'column2' THEN a.column2
WHEN 'column3' THEN a.column3
WHEN 'column4' THEN a.column4
WHEN 'column5' THEN a.column5
WHEN 'column6' THEN a.column6
END
FROM
( SELECT id,
column1,
column2,
column3,
column4,
column5,
column6
FROM #temptablelocal
WHERE id =1 ) a
CROSS JOIN
( SELECT 'column1'
UNION ALL SELECT 'column2'
UNION ALL SELECT 'column3'
UNION ALL SELECT 'column4'
UNION ALL SELECT 'column5'
UNION ALL SELECT 'column6' ) b (column_title) ) AS pop WHERE column_val IS NULL
DROP TABLE #temptablelocal

Similary, but dynamically:
drop table if exists myschema.table_with_nulls;
create table myschema.table_with_nulls as
select
n1::integer,
n2::integer,
n3::integer,
n4::integer,
c1::character varying,
c2::character varying,
c3::character varying,
c4::character varying
from
(
values
(1,2,3,4,'a','b','c','d'),
(1,2,3,null,'a','b','c',null),
(1,2,null,null,'a','b',null,null),
(1,null,null,null,'a',null,null,null)
) as test_records(n1, n2, n3, n4, c1, c2, c3, c4);
drop function if exists myschema.count_nulls(varchar,varchar);
create function myschema.count_nulls(schemaname varchar, tablename varchar) returns void as
$BODY$
declare
calc varchar;
sqlstring varchar;
begin
select
array_to_string(array_agg('(' || trim(column_name) || ' is null)::integer'),' + ')
into
calc
from
information_schema.columns
where
table_schema in ('myschema')
and table_name in ('table_with_nulls');
sqlstring = 'create temp view count_nulls as select *, ' || calc || '::integer as count_nulls from myschema.table_with_nulls';
execute sqlstring;
return;
end;
$BODY$ LANGUAGE plpgsql STRICT;
select * from myschema.count_nulls('myschema'::varchar,'table_with_nulls'::varchar);
select
*
from
count_nulls;
Though I see that I didn't finish parametising the function.

My answer builds on Drew Chapin's answer, but with changes to get the result using a single script:
use <add_database_here>;
Declare #val Varchar(MAX);
Select #val = COALESCE(#val + str, str) From
(SELECT
'(CASE WHEN '+COLUMN_NAME+' IS NULL THEN 1 ELSE 0 END) +' str
FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = '<add table name here>'
) t1 -- getting column names and adding the case when to replace NULLs for zeros or ones
Select #val = SUBSTRING(#val,1,LEN(#val) - 1) -- removing trailling add sign
Select #val = 'SELECT <add_identity_column_here>, ' + #val + ' AS NullCount FROM <add table name here>' -- adding the 'select' for the column identity, the 'alias' for the null count column, and the 'from'
EXEC (#val) --executing the resulting sql

With ORACLE:
Number_of_columns - json_value( json_array( comma separated list of columns ), '$.size()' ) from your_table
json_array will build an array with only the non null columns and the json_query expression will give you the size of the array

There isn't a straightforward way of doing so like there would be with counting rows. Basically, you have to enumerate all the columns that might be null in one expression.
So for a table with possibly null columns a, b, c, you could do this:
SELECT key_column, COALESCE(a,0) + COALESCE(b,0) + COALESCE(c,0) null_col_count
FROM my_table

Related

Microsoft Sql Case stmt using stuff

I am sending a list of int converted to string delimited by comma to the db.
On the db, I need to check if this string parameter is null, if it's null I need to check for all the rows from the table. if not null I select the rows that matches the string parameter which is split into table using sql function split function. Below is my query. Any suggestions how this can be queried better?
MM.CheckID IN
(select * from
dbo.fnSplitString((CASE WHEN #listCheckID IS NULL
THEN
(STUFF(( SELECT v.CheckID FROM (SELECT CheckID FROM TABLE1) AS v
ORDER BY v.CheckID
FOR XML PATH('')), 1, 1, ';') + ';') ELSE #listCheckID end), ';'))
Initially I had the query in Method 1, I received error 'select query returned multiple rows..' later I modified the query in Method 2 which gave me table of split values which I still think not a good solution.
Method 1:
CREATE PROCEDURE [dbo].[GetInfo]
#CheckID INT
AS
BEGIN
SELECT
MM.CheckId,
MM.Dept,
MM.Name
FROM
Table1 MM
WHERE
(MM.CheckId IN
(CASE WHEN #CheckID IS NULL THEN (SELECT CheckID from Table1)
ELSE (SELECT * FROM dbo.fnSplitString(#CheckID, ';')) END)
)
END
Method 2:
CREATE PROCEDURE [dbo].[GetInfo]
#CheckID INT
AS
BEGIN
SELECT
MM.CheckId,
MM.Dept,
MM.Name
FROM
Table1 MM
WHERE
(MM.CheckId IN
(SELECT * FROM dbo.fnSplitString( (CASE WHEN #CheckID is NUll THEN
(STUFF(( SELECT ';' + cast(v.CheckID AS VARCHAR(50))
FROM (SELECT CheckID FROM Table1) AS v
FOR XML PATH('')), 1, 1, ';') + ';')
ELSE #CheckID end), ';'))
)
END
I think you are complicating too much, this is all you need
Where (MM.CheckID IN (SELECT CheckID FROM TABLE1) and #listCheckID is null)
or MM.CheckID = #listCheckID
(MM.CheckID IN (SELECT CheckID FROM TABLE1) and #listCheckID is null) the condition works only when the #listCheckID variable is NULL due to the AND operator used in middle.
When #listCheckID variable is NOT NULL, then this or MM.CheckID = #listCheckID condition will pull the value matched with the variable.
post the full query, you might get even better solution
Update : seems like this is all you need
SELECT MM.CheckId,
MM.Dept,
MM.NAME
FROM Table1 MM
WHERE MM.CheckId IN (SELECT * --select the column name
FROM dbo.Fnsplitstring(#CheckID, ';'))
OR #CheckID IS NULL -- to return all the records when #CheckID is NULL
No idea what you are doing in that snippet. Typically, you use the following approach:
where MM.CheckID in (select ? from dbo.fnSplitString(#listCheckID))
or #listCheckID is null
...;
The question mark is a placeholder - replace it with the name of the column from the table returned by the split function. And don't start lazy/bad programming practices by using the asterisk (*) to select every column - even if the table has only one column. Things change over time; make your code resilient.

how to select columns by column index in sql? [duplicate]

Is there a way to access columns by their index within a stored procedure in SQL Server?
The purpose is to compute lots of columns. I was reading about cursors, but I do not know how to apply them.
Let me explain my problem:
I have a row like:
field_1 field_2 field_3 field_4 ...field_d Sfield_1 Sfield_2 Sfield_3...Sfield_n
1 2 3 4 d 10 20 30 n
I need to compute something like (field_1*field1) - (Sfield_1* Sfiled_1) / more...
So the result is stored in a table column d times.
So the result is a d column * d row table.
As the number of columns is variable, I was considering making dynamic SQL, getting the names of columns in a string and splitting the ones I need, but this approach makes the problem harder. I thought getting the column number by index could make life easier.
No, you can not use the ordinal (numeric) position in the SELECT clause.
Only in the ORDER BY clause can you use the ordinal position, because it's based on the column(s) specified in the SELECT clause.
First, as OMG Ponies stated, you cannot reference columns by their ordinal position. This is not an accident. The SQL specification is not built for dynamic schema either in DDL or DML.
Given that, I have to wonder why you have your data structured as you do. A sign of a mismatch between schema and the problem domain rears itself when you try to extract information. When queries are incredibly cumbersome to write, it is an indication that the schema does not properly model the domain for which it was designed.
However, be that as it may, given what you have told us, an alternate solution would be something like the following: (I'm assuming that field_1*field1 was meant to be field_1 * field_1 or field_1 squared or Power( field_1, 2 ) )
Select 1 As Sequence, field_1 As [Field], Sfield_1 As [SField], Sfiled_1 As [SFiled]
Union All Select 2, field_2, Sfield_2, Sfiled_2
...
Union All Select n, field_n, Sfield_n, Sfiled_n
Now your query looks like:
With Inputs As
(
Select 1 As Sequence, field_1 As [Field], Sfield_1 As [SField], Sfiled_1 As [SFiled]
Union All Select 2, field_2, Sfield_2, Sfiled_2
....
)
, Results As
(
Select Case
When Sequence = 1 Then Power( [Field], 2 ) - ( [SField] * [SFiled] )
Else 1 / Power( [Field], 2 ) - ( [SField] * [SFiled] )
End
As Result
From Inputs
)
Select Exp( Sum( Log( Result ) ) )
From Results
This might not be the most elegant or efficient but it works. I am using it to create a new table for faster mappings between data that I need to parse through all the columns / rows.
DECLARE #sqlCommand varchar(1000)
DECLARE #columnNames TABLE (colName varchar(64), colIndex int)
DECLARE #TableName varchar(64) = 'YOURTABLE' --Table Name
DECLARE #rowNumber int = 2 -- y axis
DECLARE #colNumber int = 24 -- x axis
DECLARE #myColumnToOrderBy varchar(64) = 'ID' --use primary key
--Store column names in a temp table
INSERT INTO #columnNames (colName, colIndex)
SELECT COL.name AS ColumnName, ROW_NUMBER() OVER (ORDER BY (SELECT 1))
FROM sys.tables AS TAB
INNER JOIN sys.columns AS COL ON COL.object_id = TAB.object_id
WHERE TAB.name = #TableName
ORDER BY COL.column_id;
DECLARE #colName varchar(64)
SELECT #colName = colName FROM #columnNames WHERE colIndex = #colNumber
--Create Dynamic Query to retrieve the x,y coordinates from table
SET #sqlCommand = 'SELECT ' + #colName + ' FROM (SELECT ' + #colName + ', ROW_NUMBER() OVER (ORDER BY ' + #myColumnToOrderBy+ ') AS RowNum FROM ' + #tableName + ') t2 WHERE RowNum = ' + CAST(#rowNumber AS varchar(5))
EXEC(#sqlCommand)

SQL How do I find values from a list that are not in a table

I have a table with values in a field called 'code'.
ABC
DFG
CDF
How would I select all codes that are not in the table from a list I have?
Eg:
SELECT * from [my list] where table1.code not in [my list]
the list is not in a table.
The list would be something like "ABC","BBB","TTT" (As strings)
Try this:
SELECT code
FROM Table1
WHERE code NOT IN ('ABC','CCC','DEF') --values from your list
It will result:
DFG
CDF
If the list is in another table, try this:
SELECT code
FROM Table1
WHERE code NOT IN (SELECT code FROM Table2)
As per your requirement, try this:
SELECT list
FROM Table2
WHERE list NOT IN (SELECT code from table1)
It will select the list values that are not in code.
See an example in SQL Fiddle
The question key point need to set "ABC","BBB","TTT" source data trun to a table.
that table will look like
|---+
|val|
|---+
|ABC|
|BBB|
|TTT|
Sqlite didn't support sqlite function. so that will be a little hard to sqlite your list to be a table.
You can use a CTE Recursive to make like sqlite function
You need to use replace function to remove " double quotes from your
source data.
There are two column in the CTE
val column carry your List data
rest column to remember current splite string
You will get a table from CTE like this.
|---+
|val|
|---+
|ABC|
|BBB|
|TTT|
Then you can compare the data with table1.
Not IN
WITH RECURSIVE split(val, rest) AS (
SELECT '', replace('"ABC","BBB","TTT"','"','') || ','
UNION ALL
SELECT
substr(rest, 0, instr(rest, ',')),
substr(rest, instr(rest, ',')+1)
FROM split
WHERE rest <> '')
SELECT * from (
SELECT val
FROM split
WHERE val <> ''
) t where t.val not IN (
select t1.code
from table1 t1
)
sqlfiddle:https://sqliteonline.com/#fiddle-5adeba5dfcc2fks5jgd7ernq
Outut Result:
+---+
|val|
+---+
|BBB|
|TTT|
If you want to show it in a line,use GROUP_CONCAT function.
WITH RECURSIVE split(val, rest) AS (
SELECT '', replace('"ABC","BBB","TTT"','"','') || ','
UNION ALL
SELECT
substr(rest, 0, instr(rest, ',')),
substr(rest, instr(rest, ',')+1)
FROM split
WHERE rest <> '')
SELECT GROUP_CONCAT(val,',') val from (
SELECT val
FROM split
WHERE val <> ''
) t where t.val not IN (
select t1.code
from table1 t1
)
Outut Result:
BBB, TTT
sqlfiddle:https://sqliteonline.com/#fiddle-5adecb92fcc36ks5jgda15yq
Note:That is unreasonable on SELECT * from [my list] where table1.code not in [my list],because This query has no place to find table1 so you couldn't get table1.code column
You can use not exists or JOIN to make your expect.
sqlfiddle:https://sqliteonline.com/#fiddle-5adeba5dfcc2fks5jgd7ernq
Can you use common table expressions?
WITH temp(code) AS (VALUES('ABC'),('BBB'),('TTT'),(ETC...))
SELECT temp.code FROM temp WHERE temp.code NOT IN
(SELECT DISTINCT table1.code FROM table1);
This would allow you to create a temporary table defined with your list of strings within the VALUES statement. Then use standard SQL to select values NOT IN your table1.code column.
Is this solution good, or am I missing something?
create table table10 (code varchar(20));
insert into table10 (code) values ('ABC');
insert into table10 (code) values ('DFG');
insert into table10 (code) values ('CDF');
select * from (
select 'ABC' as x
union all select 'BBB'
union all select 'TTT'
) t where t.x not in (select code from table10);
-- returns: BBB
-- TTT
See SQL Fiddle.
This can also be achieved using a stored procedure:
DELIMITER //
drop function if exists testcsv
//
create function testcsv(csv varchar(255)) returns varchar(255)
deterministic
begin
declare pos, found int default 0;
declare this, notin varchar(255);
declare continue handler for not found set found = 0;
set notin = '';
repeat
set pos = instr(csv, ',');
if (pos = 0) then
set this = trim('"' from csv);
set csv = '';
else
set this = trim('"' from trim(substring(csv, 1, pos-1)));
set csv = substring(csv, pos+1);
end if;
select 1 into found from table1 where code = this;
if (not found) then
if (notin = '') then
set notin = this;
else
set notin = concat(notin, ',', this);
end if;
end if;
until csv = ''
end repeat;
return (notin);
end
//
select testcsv('"ABC","BBB","TTT","DFG"')
Output:
BBB, TTT

Access columns of a table by index instead of name in SQL Server stored procedure

Is there a way to access columns by their index within a stored procedure in SQL Server?
The purpose is to compute lots of columns. I was reading about cursors, but I do not know how to apply them.
Let me explain my problem:
I have a row like:
field_1 field_2 field_3 field_4 ...field_d Sfield_1 Sfield_2 Sfield_3...Sfield_n
1 2 3 4 d 10 20 30 n
I need to compute something like (field_1*field1) - (Sfield_1* Sfiled_1) / more...
So the result is stored in a table column d times.
So the result is a d column * d row table.
As the number of columns is variable, I was considering making dynamic SQL, getting the names of columns in a string and splitting the ones I need, but this approach makes the problem harder. I thought getting the column number by index could make life easier.
No, you can not use the ordinal (numeric) position in the SELECT clause.
Only in the ORDER BY clause can you use the ordinal position, because it's based on the column(s) specified in the SELECT clause.
First, as OMG Ponies stated, you cannot reference columns by their ordinal position. This is not an accident. The SQL specification is not built for dynamic schema either in DDL or DML.
Given that, I have to wonder why you have your data structured as you do. A sign of a mismatch between schema and the problem domain rears itself when you try to extract information. When queries are incredibly cumbersome to write, it is an indication that the schema does not properly model the domain for which it was designed.
However, be that as it may, given what you have told us, an alternate solution would be something like the following: (I'm assuming that field_1*field1 was meant to be field_1 * field_1 or field_1 squared or Power( field_1, 2 ) )
Select 1 As Sequence, field_1 As [Field], Sfield_1 As [SField], Sfiled_1 As [SFiled]
Union All Select 2, field_2, Sfield_2, Sfiled_2
...
Union All Select n, field_n, Sfield_n, Sfiled_n
Now your query looks like:
With Inputs As
(
Select 1 As Sequence, field_1 As [Field], Sfield_1 As [SField], Sfiled_1 As [SFiled]
Union All Select 2, field_2, Sfield_2, Sfiled_2
....
)
, Results As
(
Select Case
When Sequence = 1 Then Power( [Field], 2 ) - ( [SField] * [SFiled] )
Else 1 / Power( [Field], 2 ) - ( [SField] * [SFiled] )
End
As Result
From Inputs
)
Select Exp( Sum( Log( Result ) ) )
From Results
This might not be the most elegant or efficient but it works. I am using it to create a new table for faster mappings between data that I need to parse through all the columns / rows.
DECLARE #sqlCommand varchar(1000)
DECLARE #columnNames TABLE (colName varchar(64), colIndex int)
DECLARE #TableName varchar(64) = 'YOURTABLE' --Table Name
DECLARE #rowNumber int = 2 -- y axis
DECLARE #colNumber int = 24 -- x axis
DECLARE #myColumnToOrderBy varchar(64) = 'ID' --use primary key
--Store column names in a temp table
INSERT INTO #columnNames (colName, colIndex)
SELECT COL.name AS ColumnName, ROW_NUMBER() OVER (ORDER BY (SELECT 1))
FROM sys.tables AS TAB
INNER JOIN sys.columns AS COL ON COL.object_id = TAB.object_id
WHERE TAB.name = #TableName
ORDER BY COL.column_id;
DECLARE #colName varchar(64)
SELECT #colName = colName FROM #columnNames WHERE colIndex = #colNumber
--Create Dynamic Query to retrieve the x,y coordinates from table
SET #sqlCommand = 'SELECT ' + #colName + ' FROM (SELECT ' + #colName + ', ROW_NUMBER() OVER (ORDER BY ' + #myColumnToOrderBy+ ') AS RowNum FROM ' + #tableName + ') t2 WHERE RowNum = ' + CAST(#rowNumber AS varchar(5))
EXEC(#sqlCommand)

SQL: count number of distinct values in every column

I need a query that will return a table where each column is the count of distinct values in the columns of another table.
I know how to count the distinct values in one column:
select count(distinct columnA) from table1;
I suppose that I could just make this a really long select clause:
select count(distinct columnA), count(distinct columnB), ... from table1;
but that isn't very elegant and it's hardcoded. I'd prefer something more flexible.
This code should give you all the columns in 'table1' with the respective distinct value count for each one as data.
DECLARE #TableName VarChar (Max) = 'table1'
DECLARE #SqlString VarChar (Max)
set #SqlString = (
SELECT DISTINCT
'SELECT ' +
RIGHT (ColumnList, LEN (ColumnList)-1) +
' FROM ' + Table_Name
FROM INFORMATION_SCHEMA.COLUMNS COL1
CROSS AppLy (
SELECT ', COUNT (DISTINCT [' + COLUMN_NAME + ']) AS ' + '''' + COLUMN_NAME + ''''
FROM INFORMATION_SCHEMA.COLUMNS COL2
WHERE COL1.TABLE_NAME = COL2.TABLE_NAME
FOR XML PATH ('')
) TableColumns (ColumnList)
WHERE
1=1 AND
COL1.TABLE_NAME = #TableName
)
EXECUTE (#SqlString)
try this (sql server 2005 syntax):
DECLARE #YourTable table (col1 varchar(5)
,col2 int
,col3 datetime
,col4 char(3)
)
insert into #YourTable values ('abcdf',123,'1/1/2009','aaa')
insert into #YourTable values ('aaaaa',456,'1/2/2009','bbb')
insert into #YourTable values ('bbbbb',789,'1/3/2009','aaa')
insert into #YourTable values ('ccccc',789,'1/4/2009','bbb')
insert into #YourTable values ('aaaaa',789,'1/5/2009','aaa')
insert into #YourTable values ('abcdf',789,'1/6/2009','aaa')
;with RankedYourTable AS
(
SELECT
ROW_NUMBER() OVER(PARTITION by col1 order by col1) AS col1Rank
,ROW_NUMBER() OVER(PARTITION by col2 order by col2) AS col2Rank
,ROW_NUMBER() OVER(PARTITION by col3 order by col3) AS col3Rank
,ROW_NUMBER() OVER(PARTITION by col4 order by col4) AS col4Rank
FROM #YourTable
)
SELECT
SUM(CASE WHEN col1Rank=1 THEN 1 ELSE 0 END) AS col1DistinctCount
,SUM(CASE WHEN col2Rank=1 THEN 1 ELSE 0 END) AS col2DistinctCount
,SUM(CASE WHEN col3Rank=1 THEN 1 ELSE 0 END) AS col3DistinctCount
,SUM(CASE WHEN col4Rank=1 THEN 1 ELSE 0 END) AS col4DistinctCount
FROM RankedYourTable
OUTPUT:
col1DistinctCount col2DistinctCount col3DistinctCount col4DistinctCount
----------------- ----------------- ----------------- -----------------
4 3 6 2
(1 row(s) affected)
and it's hardcoded.
It is not hardcoding to provide a field list for a sql statement. It's common and acceptable practice.
This won't necessarily be possible for every field in a table. For example, you can't do a DISTINCT against a SQL Server ntext or image field unless you cast them to other data types and lose some precision.
I appreciate all of the responses. I think the solution that will work best for me in this situation (counting the number of distinct values in each column of a table from an external program that has no knowledge of the table except its name) is as follows:
Run "describe table1" and pull out the column names from the result.
Loop through the column names and create the query to count the distinct values in each column. The query will look something like "select count(distinct columnA), count(distinct columnB), ... from table1".
Raj More's answer works well if you don't need to consider null as a value as count(distinct...) does not count null.
Here is a modification to count values including null by converting values to a string and replacing null with "NULL AS SOME IMPOSSIBLE STRING":
DECLARE #TableName VarChar (1024) = 'tableName'
DECLARE #SqlString VarChar (Max)
set #SqlString = (
SELECT DISTINCT
'SELECT ' +
RIGHT (ColumnList, LEN (ColumnList)-1) +
' FROM ' + Table_Name
FROM INFORMATION_SCHEMA.COLUMNS COL1
CROSS AppLy (
SELECT ', COUNT (DISTINCT coalesce(cast([' + COLUMN_NAME + '] as varchar),
''NULL AS SOME IMPOSSIBLE STRING'')) AS ' + '''' + COLUMN_NAME + ''''
FROM INFORMATION_SCHEMA.COLUMNS COL2
WHERE COL1.TABLE_NAME = COL2.TABLE_NAME
FOR XML PATH ('')
) TableColumns (ColumnList)
WHERE
COL1.TABLE_NAME = #TableName
)
EXECUTE (#SqlString)
DISTINCT is evil. Do COUNT/GROUP BY