Dynamically Count Null Values in SQL Server - sql

I'm a little new at SQL so please bear with me. I am attempting to write some a query that will allow me to loop through an entire table and find the number of times null values appear in each column. This is easy to do the hard way by typing the following:
Select
SUM(CASE COL_1 WHEN IS NULL THEN 1 ELSE 0 END) AS COL_1_NULLS
,SUM(CASE COL_2 WHEN IS NULL THEN 1 ELSE 0 END) AS COL_2_NULLS
FROM TABLE1
This is easy but it can become arduous if you want to do this for multiple tables or if a single table has a lot of columns.
I'm looking for a way to write a query that passes a table name into it and then loops through each column in the defined table (possibly pulling the column name by ordinance via a join to a metadata view?) and then sums the number of nulls in the column. Before anyone jumps on the nitpick bandwagon please keep in mind that this basic idea could be used for more than just finding nulls. Any assistance with this issue is greatly appreciated.

You need to use dynamic sql:
declare #custom_sql varchar(max)
set #custom_sql = 'SELECT null as first_row'
select
#custom_sql = #custom_sql + ', ' + 'SUM(CASE WHEN ' + COLUMN_NAME + ' IS NULL THEN 1 ELSE 0 END) as ' + COLUMN_NAME + '_NULLS'
from
INFORMATION_SCHEMA.COLUMNS where table_name = 'MYTABLE'
set #custom_sql = #custom_sql + ' FROM MYTABLE'
exec(#custom_sql)
You can also use the COALESCE term (just for a slightly different approach):
declare #custom_sql varchar(max)
select
#custom_sql = COALESCE(#custom_sql + ', ', '') + 'SUM(CASE WHEN ' + COLUMN_NAME + ' IS NULL THEN 1 ELSE 0 END) as ' + COLUMN_NAME + '_NULLS'
from
INFORMATION_SCHEMA.COLUMNS where table_name = 'users'
set #custom_sql = 'SELECT ' + #custom_sql
set #custom_sql = #custom_sql + ' FROM Users'
print #custom_sql
exec(#custom_sql)

I don't know how to make a generic query, but you can always generate the script like this
declare #sql nvarchar(max) = 'select 1 as dummy'
select #sql = #sql + '
, sum(case when [' + c.name + '] is null then 1 else 0 end) as [' + c.name + '_NULLS]'
from sys.columns c
join sys.tables t on t.object_id = c.object_id
where t.name = 'TABLE1'
set #sql = #sql + ' from TABLE1'
select #sql
Then you can execute the result eg. with exec sp_executesql #sql

For a cooler approach, you can use ISNULL to skip the first comma.
declare #sql nvarchar(max)
declare #tablename nvarchar(255) = 'xxxx'
Select #sql = ISNULL(#SQL + ',','') + ' ' + COLUMN_NAME + '_count = Sum(case when ' + COLUMN_NAME + ' is null then 1 else 0 end)' + char(13)
From information_schema.columns
where table_name = #tablename
set #sql = 'Select' + #sql + ' From ' + #tablename
print #sql
exec sp_executesql #sql

Related

Dynamic union of table if a certain field exists

I'm trying to build a dynamic union over tables that have certain fields (in my example field1 and field2). The union already works but over any table. Now I need to include only the ones that have field1 and field2.
DECLARE #SQL VARCHAR(max)
SET #SQL = ''
SELECT #SQL = #SQL + CASE Len(#SQL) WHEN 0 THEN '' ELSE ' UNION ALL ' END
+ ' SELECT [field1], [field2] FROM dbo.['
+ NAME + ']'
FROM sys.tables
WHERE NAME LIKE 'CUST_TABLE%'
EXEC (#SQL)
I guess I need to combine this query somehow:
SELECT TABLE_NAME FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME like 'CUST_TABLE%'
and COLUMN_NAME='field1'
You are close. Query the view INFORMATION_SCHEMA.COLUMNS. Aggregate per table name and make sure both columns exist for the table by counting them in the HAVING clause.
DECLARE #SQL VARCHAR(max)
SET #SQL = ''
SELECT #SQL = #SQL + CASE Len(#SQL) WHEN 0 THEN '' ELSE ' UNION ALL ' END
+ ' SELECT [field1], [field2] FROM dbo.[' + table_name + ']'
FROM information_schema.columns
WHERE table_name LIKE 'CUST_TABLE%'
GROUP BY table_name
HAVING COUNT(CASE WHEN COLUMN_NAME = 'FIELD1' THEN 1 END) > 0
AND COUNT(CASE WHEN COLUMN_NAME = 'FIELD2' THEN 1 END) > 0
EXEC (#SQL)

How to check a condition against all the columns of a table?

I have a table which has more than 30 columns(all are varchar). I need to list out all the columns which contains blank i.e.' ' values.
I tried using 'coalesce' but it is only for NULL.
The following query will give you all the columns in a table that might have null or '' values.
It is written so that you can run it for all tables in your database but you can limit it to a single table, as I have done for this specific example, checking a table called testingNulls:
--two variables needed for table name and column name, when looping through all tables
declare #table varchar(255), #col varchar(255), #sql varchar(max)
--this will be used to store the result, to have one result set instead of one row per each cursor cycle
if object_id('tempdb..#nullcolumns') is not null drop table #nullcolumns
create table #nullcolumns (tablename varchar(255), columnname varchar(255))
declare getinfo cursor for
select t.name tablename, c.name
from sys.tables t join sys.columns c on t.object_id = c.object_id
where t.name = 'testingnulls' --here the condition for the table name
open getinfo
fetch next from getinfo into #table, #col
while ##fetch_status = 0
begin
select #sql = 'if exists (select top 1 * from [' + #table + '] where [' + #col + '] is null or [' + #col + '] like '''' ) begin insert into #nullcolumns select ''' + #table + ''' as tablename, ''' + #col + ''' as all_nulls end'
print(#sql)
exec(#sql)
fetch next from getinfo into #table, #col
end
close getinfo
deallocate getinfo
--this should be the result you need:
select * from #nullcolumns
You can see a working example here. I hope this is what you need.
List all columns that contain a blank in some record? You'd use a query per column and collect the results with UNION ALL:
select 'COL1' where exists (select * from mytable where col1 like '% %')
union all
select 'COL2' where exists (select * from mytable where col2 like '% %')
union all
...
union all
select 'COL30' where exists (select * from mytable where col30 like '% %');
If you want like select * from [your_table_name] where [col1] = '' and [col2] = ''....., then use dynamic sql query like below.
Query
declare #sql as varchar(max);
select #sql = 'select * from [your_table_name] where '
+ stuff((
select ' and [' + [column_name] + '] = ' + char(39) + char(39)
from information_schema.columns
where table_name = 'your_table_name'
for xml path('')
)
, 1, 5, ''
);
exec(#sql);
Update
Or else if you want to list the column names which have a blank value, then you can use the below dynamic sql query.
Query
declare #sql as varchar(max);
select #sql = stuff((
select ' union all select ' + [column_name] + ' as [col1], '
+ char(39) + [column_name] + char(39) + ' as [col2]'
+ ' from your_table_name'
from information_schema.columns
where table_name = 'your_table_name'
for xml path('')
)
, 1, 11, ''
);
set #sql = 'select distinct t.col2 as [blank_cols] from(' + #sql
+ ')t
where coalesce(ltrim(rtrim(t.col1)), ' + char(39) + char(39) + ') = '
+ char(39) + char(39) + ';';
exec(#sql);
Find a demo here
But still I'm not sure that this is what you are looking out for.
you have not many choices but to specify all the columns in your where clause
WHERE COL1 = '' AND COL2 = '' AND COL3 = '' AND . . .
or you can use Dynamic SQL to form your query, but that is not an easy path to go
If you want to count number of columns having '' value in a table (not for each row) then use the following
SELECT max(CASE WHEN col1 = '' THEN 1 ELSE 0 END) +
max(CASE WHEN col2 = '' THEN 1 ELSE 0 END) +
max(CASE WHEN col3 = '' THEN 1 ELSE 0 END) +
...
FROM t
demo
I created a dynamic SQL script that you can use by providing the table name only
Here it is
declare #sql nvarchar(max)
declare #table sysname = 'ProductAttributes'
select #sql =
'select * from ' + #table + ' where ' +
string_agg('[' + name + '] = '' '' ', ' and ')
from sys.columns
where object_id = OBJECT_ID(#table)
select #sql
exec sp_executesql #sql
Unfortunately, for SQL string concatenation String_Agg function is new with SQL Server 2017
But it is also possible to use SQL XML Path to concatenate WHERE clause fragments
SELECT #sql = 'select * from ' + #table + ' where ' +
STUFF(
(
SELECT
' and ' + '[' + [name] + '] = '' '' '
from sys.columns
where object_id = OBJECT_ID(#table)
FOR XML PATH(''),TYPE
).value('.','VARCHAR(MAX)'
), 1, 5, ''
)
select #sql as sqlscript
exec sp_executesql #sql

select columns with value NA

How to select columns in a table that only contain a specific value for all the rows? I am trying to find these columns to do an update on those values with a NULL value. In my columns I have varied range of values including NA
I am using SQL Server 2012.
I've tried doing: thsi only gives me column names. Can i add to this condition for columns with value 'NA'?
SELECT COLUMN_NAME AS NAMES,COLUMN_DEFAULT
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_SCHEMA = 'dbo'
AND TABLE_NAME = 'ABC'
I am a beginner in SQL. Trying to figure out how to do this.
If min of column equals to max then that column contains same values:
Select
case when min(col1) = max(col1) then 1 else 0 end as Col1IsSame,
case when min(col2) = max(col2) then 1 else 0 end as Col2IsSame,
...
from Table
With dynamic query:
declare #s nvarchar(max) = 'select '
select #s = #s + 'case when min(' + COLUMN_NAME + ') = max(' +
COLUMN_NAME + ') then 1 else 0 end as ' + COLUMN_NAME + ','
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_SCHEMA = 'dbo'
AND TABLE_NAME = 'Table'
Set #s = substring(#s, 1, len(#s) - 1) + ' from Table'
exec(#s)
TRY THIS QUERY
DECLARE #SQLQUERY NVARCHAR(MAX)
declare #tableName varchar(50)
DECLARE #NAME VARCHAR(50)
Declare #ParamDefinition AS NVarchar(2000)
Set #ParamDefinition = '#OIM VARCHAR(20)'
SELECT NAME
FROM sys.objects
WHERE [object_id]=#OIM
set #tableName= (SELECT NAME
FROM sys.objects
WHERE [object_id]=#OIM)
SET #NAME=(SELECT C.NAME
FROM sys.columns c
JOIN
sys.tables t ON c.object_id = t.object_id
WHERE c.name in (select distinct name
from sys.columns
where object_id=#OIM))
SET #SQLQUERY = ''
SELECT #SQLQUERY = #SQLQUERY + 'UPDATE ' + #tableName + ' SET ' + #NAME + ' = NULL WHERE ' + #NAME + ' = NA ; '
PRINT #SQLQUERY
Execute sp_Executesql #SQLQUERY , #ParamDefinition, #OIM
end

Global Update (All Rows) On One SQL Table With The Same Criteria

IS there a way to run this across every column in a table:
UPDATE dbo.stage_a
SET Statement_Name= NULL
WHERE Statement_Name='""';
I am trying to tidy up after a data import.
Dynamic query plus Information_schema.columns. Try this.
DECLARE #cols NVARCHAR(max)='UPDATE dbo.stage_a set '
SELECT #cols += COLUMN_NAME + '=case when ' + COLUMN_NAME
+ ' = '""' then null else '+COLUMN_NAME+' end,'
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = 'stage_a'
AND TABLE_SCHEMA = 'dbo'
SELECT #cols = LEFT(#cols, Len(#cols) - 1)
PRINT #cols
EXEC Sp_executesql #cols
UPDATE Customers
SET ContactName='Alfred Schmidt', City='Hamburg'
WHERE CustomerName='Alfreds Futterkiste';
Like this example you should specify all coloumns.(separated by commas)
If you want to replace all coloumns with NULL why don't you delete the rows:
DELETE FROM Customers
WHERE CustomerName='Alfreds Futterkiste'
You meed to first get the all columns list by using the INFORMATION_SCHEMA.COLUMNS
DECLARE #tableName varchar(10)
SET #tableName = 'mm'
DECLARE #sql VARCHAR(MAX)
SET #sql = ''
SELECT #sql = #sql + 'UPDATE ' + #tableName + ' SET ' + c.name + ' = NULL WHERE ' + c.name + '= ''';'
FROM sys.columns c
INNER JOIN sys.tables t ON c.object_id = t.object_id
INNER JOIN sys.types y ON c.system_type_id = y.system_type_id
WHERE t.name = #tableName AND y.name IN ('varchar', 'nvarchar', 'char', 'nchar')
EXEC (#sql)
Query is not tested may need to tweak as per your need

Return a percentage of NULL values for a specific record

The following query returns the values of the table for each field in terms of null percentage . What I want is to get the sum of those percentages for a specific ProductID. Also, I would like to get a percentage (in an extra column) of the fields do not have value i.e. ="". Any ideas?
use AdventureWorks
DECLARE #TotalCount decimal(10,2), #SQL NVARCHAR(MAX)
SELECT #TotalCount = COUNT(*) FROM [AdventureWorks].[Production].[Product]
SELECT #SQL =
COALESCE(#SQL + ', ','SELECT ') +
'cast(sum (case when ' + QUOTENAME(column_Name) +
' IS NULL then 1 else 0 end)/#TotalCount*100.00 as decimal(10,2)) as [' +
column_Name + ' NULL %]
'
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = 'Product' and TABLE_SCHEMA = 'Production'
SET #SQL = 'set #TotalCount = NULLIF(#TotalCount,0)
' + #SQL + '
FROM [AdventureWorks].Production.Product'
print #SQL
EXECUTE SP_EXECUTESQL #SQL, N'#TotalCount decimal(10,2)', #TotalCount
You can use the following:
use AdventureWorks
DECLARE #colCount int;
DECLARE #nullCheck nvarchar(max) = N'';
DECLARE #emptyCheck nvarchar(max) = N'';
DECLARE #SQL NVARCHAR(MAX);
DECLARE #KeyToCheck int = 123; -- adapt as necessary
SELECT
#nullCheck += '
+ ' + 'count(' + QUOTENAME(column_Name) + ')'
,#emptyCheck += '
+ ' +
CASE
WHEN DATA_TYPE IN('bigint', 'int', 'smallint', 'tinyint', 'bit', 'money', 'smallmoney', 'numeric', 'decimal', 'float', 'real') THEN
-- check numeric data for zero
'sum(case when coalesce(' + QUOTENAME(column_Name) + ', 0) = 0 then 1 else 0 end)'
WHEN DATA_TYPE like '%char' or DATA_TYPE like '%text' THEN
--check character data types for empty string
'sum(case when coalesce(' + QUOTENAME(column_Name) + ', '''') = '''' then 1 else 0 end)'
ELSE -- otherwise, only check for null
'sum(case when ' + QUOTENAME(column_Name) + ' IS NULL then 1 else 0 end)'
END
,#colCount =
count(*) over()
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = 'Product' and TABLE_SCHEMA = 'Production'
;
SET #SQL = 'SELECT case when count(*) > 0 then 100.00 - (' + #nullCheck + '
) * 100.00 / ' + cast(#colCount as nvarchar(max)) + '.00 / count(*) end as null_percent
, case when count(*) > 0 then (' + #emptyCheck + '
) * 100.00 / ' + cast(#colCount as nvarchar(max)) + '.00 / count(*) end as empty_percent
FROM Production.Product
WHERE ProductID = ' + cast(#KeyToCheck as nvarchar(max))
;
print #SQL;
EXECUTE (#SQL)
I simplified one of your expressions: Instead of sum (case when <column> IS NULL then 1 else 0 end), you can just use count(<column>). When using count with an expression instead of *, it counts the rows where this expression is non-null. As this is the opposite from what you need, I added the 100.00 - as the start of the SELECT.
For the "empty check", this would make the logic more complex to understand, hence I left the original logic there and extended it. There, I implemented an check for emptiness for numeric and character/text data types. You can easily extend that for date, binary data etc. with whichever logic you use to determine if a column is empty.
I also found it more simple to leave first + in the two variables #nullCheck and #emptyCheck, as it is valid SQL to start an expression wit this.
I also extended the statement so that if there would potentially be more than one record with ProductId = 123, it shows the average across all records, i. e. the total sum divided by the count of rows. And the outermost case expressions just avoid an division by zero error if count(*) would be zero, i. e. no record with ProductId = 123 found. In that case the return value is null.
You could use AVG function:
SELECT AVG(CASE WHEN value IS NULL THEN 100 ELSE 0 END) AS Percents
FROM Table
UPDATE:
Here is your script:
DECLARE #SQL NVARCHAR(MAX), #TABLE_NAME NVARCHAR(MAX), #TABLE_SCHEMA NVARCHAR(MAX), #PK NVARCHAR(MAX)
SET #TABLE_NAME = 'tblBigTable'
SET #TABLE_SCHEMA = 'dbo'
SET #PK = '8'
SELECT
#SQL = COALESCE(#SQL + ', ', 'SELECT ') +'AVG(CASE WHEN ' + COLUMN_NAME + ' IS NULL THEN 100 ELSE 0 END) AS [' + COLUMN_NAME +' NULL %]'
FROM
INFORMATION_SCHEMA.COLUMNS
WHERE
TABLE_SCHEMA = #TABLE_SCHEMA AND
TABLE_NAME = #TABLE_NAME
SET #SQL = #SQL + ' FROM ' + #TABLE_NAME + ' WHERE pkId = ''' + #PK + ''''
print #SQL
EXECUTE SP_EXECUTESQL #SQL