Find columns that contain only zeros - sql

I'm working with SQL Server 2008. I have a list of column names on a table and I'd like to know how to use SQL to return the names of those columns which contain nothing but zero or NULL values.

declare #T table
(
Col1 int,
Col2 int,
Col3 int,
Col4 int
)
insert into #T values
(1, 0 , null, null),
(0, null, 0 , 1)
select U.ColName
from
(
select count(nullif(Col1, 0)) as Col1,
count(nullif(Col2, 0)) as Col2,
count(nullif(Col3, 0)) as Col3,
count(nullif(Col4, 0)) as Col4
from #T
) as T
unpivot
(C for ColName in (Col1, Col2, Col3, Col4)) as U
where U.C = 0
Result:
ColName
----------
Col2
Col3
The idea behind this is to count the non null values and only keep those with a count of 0.
COUNT will only count non null values.
NULLIF(ColX, 0) will make all 0 into null.
The inner query returns one row with four columns. UNPIVOT will turn it around so you have two columns and four rows.
Finally where U.C = 0 makes sure that you only get the columns that has no values other than null or 0.

Here is a brute force way, since you know all the column names.
CREATE TABLE dbo.splunge
(
a INT,
b INT,
c INT,
d INT
);
INSERT dbo.splunge VALUES (0,0,1,-1), (0,NULL,0,0), (0,0,0,NULL);
SELECT
cols = STUFF(
CASE WHEN MIN(COALESCE(a,0)) = MAX(COALESCE(a,0)) THEN ',a' ELSE '' END
+ CASE WHEN MIN(COALESCE(b,0)) = MAX(COALESCE(b,0)) THEN ',b' ELSE '' END
+ CASE WHEN MIN(COALESCE(c,0)) = MAX(COALESCE(c,0)) THEN ',c' ELSE '' END
+ CASE WHEN MIN(COALESCE(d,0)) = MAX(COALESCE(d,0)) THEN ',d' ELSE '' END,
1, 1, '')
FROM dbo.splunge;
-- result:
-- a,b
GO
DROP TABLE dbo.splunge;
You could probably generate much of this script instead of doing it manually, assuming you know the naming scheme or data type of the columns you want (or just by leaving off the where clause entirely and removing the columns you don't want manually).
SELECT CHAR(13) + CHAR(10) + ' + CASE WHEN MIN(COALESCE(' + name + ',0)) = '
+ 'MAX(COALESCE(' + name + ',0)) THEN '',' + name + ''' ELSE '''' END'
FROM sys.columns
WHERE [object_id] = OBJECT_ID('dbo.splunge')
-- AND system_type_id = 56
-- AND name LIKE '%some pattern%'
;
The output will look like the middle of the first query, so you can copy & paste and then remove the first + and add the surrounding STUFF and query...

Here's a way that works for any table:
declare #tableName nvarchar(max) = N'myTable'
declare #columnName nvarchar(max)
create table #zeros (column_name varchar(max))
declare c cursor local forward_only read_only for
select column_name
from information_schema.COLUMNS WHERE table_name = #tableName
open c
fetch next from c into #columnName
while ##FETCH_STATUS = 0
begin
declare #retval int
declare #sql nvarchar(max) =
N'set #retval = (select count(*) from ' + #tableName + N' where coalesce(' + #columnName + N', 0) <> 0)'
exec sp_executesql #sql, N'#retval int out', #retval=#retval out
select #retval
if #retval = 0
begin
insert into #zeros (column_name) values (#columnName)
end
fetch next from c into #columnName
end
close c
deallocate c
select * from #zeros
drop table #zeros

Related

How to loop through a list of table names and see if a specific value in a column exists?

I have produced a table in SQL with a list of tables. This list of tables is stored under the column 'table_name'. I want to loop through each entry under 'table_name' and return a 1 if that table has a value in a specific column or 0 if that table does not have a value in a specific column.
How would I do that?
Edited With sample data
table_name
tabel1
table2
table3
table4
Pseudo Code
For i in table_name
if count(table_name["col_name"] = "value") > 0
return 1
else
return 0
Try this:
drop table if exists #t
create table #t (A int)
insert into #t
select 1
drop table if exists #t2
create table #t2 (A int)
insert into #t2
select 0
drop table if exists #tables
create table #tables (tab varchar(100))
declare
#loop table (rn int, tab varchar(100))
declare
#res table (cnt int)
declare
#i int=1
,#tab varchar (100)=''
,#query nvarchar (max)
insert into #tables
select '#t'
union all
select '#t2'
insert into #loop
select ROW_NUMBER () over (partition by (select 1) order by tab),tab from #tables
while #i<=(select max(rn) from #loop)
begin
select #tab=tab from #loop where rn=#i
set #query='select count(*) from '+#tab+' where a=1'
insert into #res
exec(#query)
if (select cnt from #res)>0 select 'Exists' else select 'Not Exists'
delete #res
set #i=#i+1
end
Assuming you have a singular column/value in question, you can try the following in SSMS:
DECLARE #Tables table ( table_name varchar(50) );
INSERT INTO #Tables VALUES
( 'Child' ), ( 'COS' ), ( 'CustomData' ), ( 'Misc' ), ( 'Misc2' );
DECLARE
#col varchar(50) = 'id', -- column to be queried
#val varchar(50) = '1', -- value to be queried
#sql varchar(MAX) = '' -- important! set to empty string;
SELECT
#sql = #sql + CASE WHEN LEN( #sql ) > 0 THEN ' UNION ' ELSE '' END
+ 'SELECT ' + QUOTENAME( table_name, '''' ) + ' AS [table_name], COUNT(*) AS [value_count] FROM [' + table_name + '] WHERE [' + #col + ']=' + QUOTENAME( #val, '''' )
FROM #Tables t WHERE EXISTS (
SELECT * FROM sys.columns c WHERE c.object_id = OBJECT_ID( t.table_name ) AND c.[name] = #col
);
EXEC( #sql );
In my environment this returns:
+------------+-------------+
| table_name | value_count |
+------------+-------------+
| Child | 1 |
| Misc | 1 |
| Misc2 | 0 |
+------------+-------------+
This is the (beautified) dynamic SQL created:
SELECT 'Child' AS [table_name], COUNT(*) AS [value_count] FROM [Child] WHERE [id]='1'
UNION
SELECT 'Misc' AS [table_name], COUNT(*) AS [value_count] FROM [Misc] WHERE [id]='1'
UNION
SELECT 'Misc2' AS [table_name], COUNT(*) AS [value_count] FROM [Misc2] WHERE [id]='1'
The EXISTS in the WHERE clause eliminates any tables that do not have the column in question, and thereby any errors related to it.

Select from a table only the columns that are not empty?

I have a table with hundreds of columns:
------------------------------------------------
ID ColA ColB ColC Col D ... ColZZZ
------------------------------------------------
1 bla
2 foo
3 bar
4 baz
------------------------------------------------
I need to know which columns have no values in them (that is: which are empty '' not NULL)
I could create a query for every column:
select count(1) from [table] where [ColA] <> ''; -- returns 2, so not, not empty
select count(1) from [table] where [ColB] <> ''; -- returns 1, so no
select count(1) from [table] where [ColC] <> ''; -- returns 0, so yay! found and empty one
...
etc
But there has to be an easier way for this?
Is there a way to return [table] without the empty columns, in other words:
----------------------------
ID ColA ColB ColZZZ
----------------------------
1 bla
2 foo
3 bar
4 baz
----------------------------
Here is solution to it. I used this query before too search for empty columns across all tables. Slightly modified now to search for non-empty, it might have few extra parts not needed in you example.
You create a temp table to store column names that are not empty, and use cursor to create dynamic sql to search for them.
In the end, just generate another dynamic sql to select columns based on temp table results.
IF (OBJECT_ID('tempdb..#tmpRez') IS NOT NULL) DROP TABLE #tmpRez;
CREATE TABLE #tmpRez (TableName sysname, ColName sysname);
DECLARE crs CURSOR LOCAL FAST_FORWARD FOR
SELECT t.name, c.name FROM sys.tables t
INNER JOIN sys.columns c ON c.object_id=t.object_id
WHERE 1=1
AND t.name = 'Table1' -- OR your own condition
OPEN crs;
DECLARE #tbl sysname;
DECLARE #col sysname;
DECLARE #sql NVARCHAR(MAX);
FETCH NEXT FROM crs INTO #tbl,#col;
WHILE ##FETCH_STATUS = 0
BEGIN
SELECT #sql = 'IF EXISTS (SELECT * FROM ['+ #tbl+'] WHERE [' + #col + '] <> '''') INSERT INTO #tmpRez SELECT ''' + #tbl +''','''+ #col + '''';
EXEC(#sql);
FETCH NEXT FROM crs INTO #tbl,#col;
END;
CLOSE crs;
DEALLOCATE crs;
SELECT #sql = 'SELECT ' + STUFF((SELECT ',' + ColName FROM #tmpRez x
where x.TableName = y.TableName
FOR XML PATH ('')), 1, 1, '') + ' FROM ' + TableName
FROM #tmpRez y GROUP BY TableName
EXEC (#sql)
How about this to return the table with no empty columns:
SELECT * from table
WHERE column IS NOT NULL AND TRIM(column) <> ''
This to return the table with empty columns:
SELECT * from table
WHERE column IS NULL AND TRIM(column) = ''

Get top three most common values from every column in a table

I'm trying to write a query that will produce a very small sample of data from each column of a table, in which the sample is made up of the top 3 most common values. This particular problem is part of a bigger task, which is to write scripts that can characterize a database and its tables, its data integrity, and also quickly survey common values in the table on a per-column basis. Think of this as an automated "analysis" of a table.
On a single column basis, I do this already by simply calculating the frequency of values and then sorting by frequency. If I had a column called "color" and all colors were in it, and it just so happened that the color "blue" was in most rows, then the top 1 most frequently occurring value would be "blue". In SQL that is easy to calculate.
However, I'm not sure how I would do this over multiple columns.
Currently, when I do a calculation over all columns of a table, I perform the following type of query:
USE database;
DECLARE #t nvarchar(max)
SET #t = N'SELECT '
SELECT #t = #t + 'count(DISTINCT CAST(' + c.name + ' as varchar(max))) "' + c.name + '",'
FROM sys.columns c
WHERE c.object_id = object_id('table');
SET #t = SUBSTRING(#t, 1, LEN(#t) - 1) + ' FROM table;'
EXEC sp_executesql #t
However, its not entirely clear to me how I would do that here.
(Sidenote:columns that are of type text, ntext, and image, since those would cause errors while counting distinct values, but i'm less concerned about solving that)
But the problem of getting top three most frequent values per column has got me absolutely stumped.
Ideally, I'd like to end up with something like this:
Col1 Col2 Col3 Col4 Col5
---------------------------------------------------------------------
1,2,3 red,blue,green 29,17,0 c,d,j nevada,california,utah
I hacked this together, but it seems to work:
I cant help but think I should be using RANK().
USE <DB>;
DECLARE #query nvarchar(max)
DECLARE #column nvarchar(max)
DECLARE #table nvarchar(max)
DECLARE #i INT = 1
DECLARE #maxi INT = 10
DECLARE #target NVARCHAR(MAX) = <table>
declare #stage TABLE (i int IDENTITY(1,1), col nvarchar(max), tbl nvarchar(max))
declare #results table (ColumnName nvarchar(max), ColumnValue nvarchar(max), ColumnCount int, TableName NVARCHAR(MAX))
insert into #stage
select c.name, o.name
from sys.columns c
join sys.objects o on o.object_id=c.object_id and o.type = 'u'
and c.system_type_id IN (select system_type_id from sys.types where [name] not in ('text','ntext','image'))
and o.name like #target
SET #maxi = (select max(i) from #stage)
while #i <= #maxi
BEGIN
set #column = (select col from #stage where i = #i)
set #table = (select tbl from #stage where i = #i)
SET #query = N'SELECT ' +''''+#column+''''+' , '+ #column
SELECT #query = #query + ', COUNT( ' + #column + ' ) as count' + #column + ' , ''' + #table + ''' as tablename'
select #query = #query + ' from ' + #table + ' group by ' + #column
--Select #query
insert into #results
EXEC sp_executesql #query
SET #i = #i + 1
END
select * from #results
; with cte as (
select *, ROW_NUMBER() over (partition by Columnname order by ColumnCount desc) as rn from #results
)
select * from cte where rn <=3
Start with this SQL Statement builder, and modify it to suit your liking:
EDIT Added Order by Desc
With ColumnSet As
(
Select TABLE_SCHEMA, TABLE_NAME, COLUMN_NAME
From INFORMATION_SCHEMA.COLUMNS
Where 1=1
And TABLE_NAME IN ('Table1')
And COLUMN_NAME IN ('Column1', 'Column2')
)
Select 'Select Top 3 ' + COLUMN_NAME + ', Count (*) NumInstances From ' + TABLE_SCHEMA + '.'+ TABLE_NAME + ' Group By ' + COLUMN_NAME + ' Order by Count (*) Desc'
From ColumnSet

Selecting columns from a query

I'm using a request to get a collection of columns name:
SELECT COLUMN_NAME
FROM INFORMATION_SCHEMA.COLUMNS
WHERE [...]
From this collection, I'd like to count every not null, not empty value from the original table group by column name.
Let's say I have a table containing
COL1 | COL2 | COL3
------------------
VAL1 | VAL2 | NULL
VAL3 | | VAL4
VAL5 | |
I'm looking for a request to get:
COL1 | 3
COL2 | 1
COL2 | 1
It's for analytics purpose.
Thanks for your help!
Here is a simple process. Run the following query:
SELECT 'SELECT ''' + COLUMN_NAME + ''', COUNT(['+COLUMN_NAME']) as NotNull FROM [' +SCHEMA_NAME+ '].['+TABLE_NAME+ '] union all '
FROM INFORMATION_SCHEMA.COLUMNS
WHERE [...]
Copy the results into a query window, remove the final union all, and run the query.
The below code seems to work for your issue
create table sample
(
col1 varchar(10),
col2 varchar(10),
col3 varchar(10)
)
INSERT INTO sample (COL1,COL2,COL3) VALUES ('VAL1 ',' VAL2 ',NULL);
INSERT INTO sample (COL1,COL2,COL3) VALUES ('VAL3 ',' ',' VAL4');
INSERT INTO sample (COL1,COL2,COL3) VALUES ('VAL5 ',' ',' ');
DECLARE #cols1 NVARCHAR(MAX);
DECLARE #sql NVARCHAR(MAX);
SELECT #cols1 = STUFF((
SELECT ', COUNT(CASE WHEN len(['+ t1.NAME + '])!=0 THEN 1 END) AS ' + t1.name
FROM sys.columns AS t1
WHERE t1.object_id = OBJECT_ID('sample')
--ORDER BY ', COUNT([' + t1.name + ']) AS ' + t1.name
FOR XML PATH('')
), 1, 2, '');
SET #sql = '
SELECT ' + #cols1 + '
FROM sample
'
EXEC(#sql)
Hereis my little longer take on this:
declare #cols table (colID integer, colName varchar(50))
declare #results table (colName nvarchar(50), valueCount bigint)
-- table name
declare #tableName nvarchar(50) = 'INSERT TABLE NAME HERE'
-- select column names from table
insert into #cols
select column_id, name from sys.columns where object_id = object_id(#tableName) order by column_id
declare #currentColID int = 0
declare #currentName nvarchar(50) = ''
declare #currentCount bigint = 0
declare #sql nvarchar(max) -- where the dynamic sql will be stored
-- go through all columns
while (1 = 1)
begin
-- step by id
select top 1 #currentColID = c.colID, #currentName = c.colName from #cols c
where c.colid > #currentColID order by c.colID
if ##ROWCOUNT = 0 break;
-- dynamic query to get non-empty, not-null counts
select #sql = 'select #p1=COUNT(' + #currentName + ') from ' + #tableName +
' where ' + #currentName + ' is not null or LEN(' + #currentName + ') > 0'
exec sp_executesql #sql, N'#p1 bigint output', #p1 = #currentCount output
-- insert result to buffer
insert into #results values (#currentName, #currentCount)
end
-- print the buffer
select * from #results
Have fun :)

Need SQL Query Help, matching a stored procedure list parameter against individual columns

I have a stored procedure..
Proc:
spMyStoredProcedure
Takes a Param:
#List varchar(255)
The #List would be a comma separated list of values i.e... #List = '1,2,3'
(clarity.. a single value of 1 would mean all records with col1 = true)
I have a table with these columns: ID int, col1 bit, col2 bit, col3 bit, col4 bit.
ID | col1 | col2 | col3 | col4
------------------------------
12 | 0 | 1 | 0 | 0
13 | 1 | 0 | 0 | 0
14 | 0 | 0 | 1 | 0
15 | 0 | 0 | 0 | 1
I'd like my result to only include ID's for those rows in the list.
i.e. 12,13,14.
My first thought is to loop through the list and do a select. ie. for the first value being 1, I grab all records with a 1 (true) in col1 (resulting in record 12). Then move onto the second value being 2 and grab all records with a 1 (true) in col2 (resulting in record 14) and so on.. I'm wondering if there's a more efficient/better/cleaner way to do this?
I think this solves the problem:
declare #sql as nvarchar(max)
set #sql = 'select * from table where col' +
replace(#list, ',', '=1 or col') + '=1'
sp_executesql #sql
This is assuming that list is not user-generated, and it's code generated, so I'm not guarding against SQL injection attacks. A list like this isn't generally something that's user-generated, so that's why I'm assuming as such.
Is this what you're looking for?
If you are using Sql Server 2005 you should be able to use the PIVOT function to transpose your data.
Consider the following, using dynamic SQL. This is probably not the best method, but it's the only way I can think of to not issue multiple select statements against your table. This assumes you have cleansed your list of values and you've done the error checking first...
CREATE TABLE #tmp
(
ID INT,
col1 BIT,
col2 BIT,
col3 BIT,
col4 BIT
)
INSERT INTO #tmp (ID, col1, col2, col3, col4) VALUES (12,0,1,0,0)
INSERT INTO #tmp (ID, col1, col2, col3, col4) VALUES (13,1,0,0,0)
INSERT INTO #tmp (ID, col1, col2, col3, col4) VALUES (14,0,0,1,0)
INSERT INTO #tmp (ID, col1, col2, col3, col4) VALUES (15,0,0,0,1)
DECLARE #List VARCHAR(255)
SELECT #List = '1,2,3'
-- create dynamic sql statement
DECLARE #sql VARCHAR(MAX)
SET #sql = 'SELECT ID FROM #tmp WHERE '
-- append where statement
IF CHARINDEX('1',#List) > 0 BEGIN
SET #sql = #sql + 'col1 = 1 OR '
END
IF CHARINDEX('2',#List) > 0 BEGIN
SET #sql = #sql + 'col2 = 1 OR '
END
IF CHARINDEX('3',#List) > 0 BEGIN
SET #sql = #sql + 'col3 = 1 OR '
END
IF CHARINDEX('4',#List) > 0 BEGIN
SET #sql = #sql + 'col4 = 1 OR '
END
-- remove the trailing 'OR' and execute
SET #sql = SUBSTRING(#sql,0,LEN(#sql)-2)
EXEC (#sql)
Usually I use a calculated column to store a total of bit columns. Assuming you cannot define the calculated column, we can use similar approach like this
SELECT ID FROM SampleTable WHERE (col1*1 + col2*2 + col3*4 + col4*8) IN (1, 2, 3)
Since 1,2,3 are all concatenated in a variable, for that I usually use a User Defined function as defined at http://www.sqlteam.com/article/using-a-csv-with-an-in-sub-select.
So, it would be something like
SELECT ID FROM SampleTable WHERE (col1*1 + col2*2 + col3*4 + col4*8) IN dbo.CSVToInt(#List)
** wrong answer, check Eric's answer for a better reply!
Luckily, your list seems to match the IN format:
select * from yourtable where id in (1,2,3,4)
So you could easily build a dynamic query:
declare #query varchar(4000)
set #query = 'select * from yourtable where id in (' +
#list + ')'
exec sp_executesql #query
Keep in mind that your client has to ensure the #list parameter is injection clean. You could verify in the stored procedure that it doesn't contain any single quotes.
We use a UDF that parses a comma-list and returns a table value:
CREATE FUNCTION [dbo].[ListToTable] (
#list VARCHAR(MAX),
#separator VARCHAR(MAX) = ','
)
RETURNS #table TABLE (Value VARCHAR(MAX))
AS BEGIN
DECLARE #position INT, #previous INT
SET #list = #list + #separator
SET #previous = 1
SET #position = CHARINDEX(#separator, #list)
WHILE #position > 0 BEGIN
IF #position - #previous > 0
INSERT INTO #table VALUES (LTRIM(RTRIM(SUBSTRING(#list, #previous, #position - #previous))))
IF #position >= LEN(#list) BREAK
SET #previous = #position + 1
SET #position = CHARINDEX(#separator, #list, #previous)
END
RETURN
END
Then you can do something like:
SELECT *
FROM MyTable m
JOIN dbo.ListToTable(#List) l
ON m.ID = l.Value
The function above also takes an optional separator like so:
SELECT *
FROM MyTable m
JOIN dbo.ListToTable('1:2:3', ':') l
ON m.ID = l.Value