How to detect and remove a column that contains only null values? - sql

In my table table1 there are 6 columns Locations,a,b,c,d,e.
Locations [a] [b] [c] [d] [e]
[1] 10.00 Null Null 20.00 Null
[2] Null 30.00 Null Null Null
i need the result like
Locations [a] [b] [d]
[1] 10.00 Null 20.00
[2] Null 30.00 Null
My question is how to detect and delete column that contains all null values using sql query.
Is it possible?
If yes then please help and give sample.

Here is a fast (and ugly) stored proc that takes the name of the table and print (or drop if you want it to) the fields that are full of nulls.
ALTER procedure mysp_DropEmptyColumns
#tableName nvarchar(max)
as begin
declare #FieldName nvarchar(max)
declare #SQL nvarchar(max)
declare #CountDef nvarchar(max)
declare #FieldCount int
declare fieldNames cursor local fast_forward for
select c.name
from syscolumns c
inner join sysobjects o on c.id=o.id
where o.xtype='U'
and o.Name=#tableName
open fieldNames
fetch next from fieldNames into #FieldName
while (##fetch_status=0)
begin
set #SQL=N'select #Count=count(*) from "'+#TableName+'" where "'+#FieldName+'" is not null'
SET #CountDef = N'#Count int output';
exec sp_executeSQL #SQL, #CountDef, #Count = #FieldCount output
if (#FieldCount=0)
begin
set #SQL = 'alter table '+#TableName+' drop column '+#FieldName
/* exec sp_executeSQL #SQL */
print #SQL
end
fetch next from fieldNames into #FieldName
end
close fieldNames
end
This uses a cursor, and is a bit slow and convoluted, but I suspect that this is a kind of procedure that you'll be running often

How to detect whether a given column has only the NULL value:
SELECT 1 -- no GROUP BY therefore use a literal
FROM Locations
HAVING COUNT(a) = 0
AND COUNT(*) > 0;
The resultset will either consist of zero rows (column a has a non-NULL value) or one row (column a has only the NULL value). FWIW this code is Standard SQL-92.

SQL is more about working on rows rather than columns.
If you're talking about deleting rows where c is null, use:
delete from table1 where c is null
If you're talking about dropping a column when all rows have null for that column, I would just find a time where you could lock out the DB from users and execute one of:
select c from table1 group by c
select distinct c from table1
select count(c) from table1 where c is not null
Then, if you only get back just NULL (or 0 for that last one), weave your magic (the SQL Server command may be different):
alter table table1 drop column c
Do this for whatever columns you want.
You really need to be careful if you're deleting columns. Even though they may be full of nulls, there may be SQL queries out there that use that column. Dropping the column will break those queries.

SELECT * FROM table1 WHERE c IS NOT NULL -- or also SELECT COUNT(*)
To detect if indeed this column has no values at all.
ALTER TABLE table1 DROP COLUMN c
is the query to remove the column if it is deemed desirable.

Try this stored procedure with your table name as input.
alter proc USP_DropEmptyColumns
#TableName varchar(255)
as
begin
Declare #col varchar(255), #cmd varchar(max)
DECLARE getinfo cursor for
SELECT c.name FROM sys.tables t JOIN sys.columns c ON t.Object_ID = c.Object_ID
WHERE t.Name = #TableName
OPEN getinfo
FETCH NEXT FROM getinfo into #col
WHILE ##FETCH_STATUS = 0
BEGIN
SELECT #cmd = 'IF NOT EXISTS (SELECT top 1 * FROM [' + #TableName + '] WHERE [' + #col + '] IS NOT NULL)
BEGIN
ALTER TABLE [' + #TableName + '] DROP Column [' + #col + ']
end'
EXEC(#cmd)
FETCH NEXT FROM getinfo into #col
END
CLOSE getinfo
DEALLOCATE getinfo
end

PROC PRINT DATA=TABLE1;RUN;
PROC TRANSPOSE DATA=TABLE1 OUT=TRANS1;VAR A B C D E;RUN;
DATA TRANS2;SET TRANS1;IF COL1 = . AND COL2 = . THEN DELETE;RUN;
PROC TRANSPOSE DATA=TRANS2 OUT=TABLE2 (DROP=_NAME_);VAR COL1-COL2;RUN;
PROC PRINT DATA=TABLE2;RUN;

If you want to perform the stored proc on all the tables in your database.
DECLARE #table_name AS VARCHAR(128);
DECLARE table_cursor CURSOR FOR
SELECT name FROM sys.tables;
OPEN table_cursor;
FETCH NEXT FROM table_cursor INTO #table_name;
WHILE ##FETCH_STATUS = 0
BEGIN
EXEC USP_DropEmptyColumns #table_name;
FETCH NEXT FROM table_cursor INTO #table_name;
END;
CLOSE table_cursor;
DEALLOCATE table_cursor;

Related

Creating a Data Dictionary with example data (SQL)

I am looking to run a script on a SQL database to create a Data Dictionary with an example of the data for each field.
To keep it simple I would just like to include the data from the first row of each table along with each table name and column name
So something like this:
Table Name
Field Name
Example Data
Customer
ID
CU1
Customer
Title
Mrs
Customer
Name
Anne
Customer
Order No
ORD1
etc.
Is there an easy way to do this with a SQL script?
Somebody smarter than me could probably optimize this and remove the cursor but the Dynamic SQL was giving me a headache. I think in this scenario, a cursor is acceptable
DROP TABLE IF EXISTS ##DataDictionary
CREATE TABLE ##DataDictionary (TableName SYSNAME, ColumnName SYSNAME, SampleData NVARCHAR(MAX))
DECLARE #TableName SYSNAME
DECLARE #ColumnName SYSNAME
DECLARE #SQL NVARCHAR(MAX)
DECLARE cur CURSOR FOR
SELECT t.name AS TableName,c.Name AS ColumnName
FROM sys.tables t
JOIN sys.columns c ON t.object_id = c.object_id
OPEN cur
FETCH cur INTO #TableName,#ColumnName
WHILE ##FETCH_STATUS = 0
BEGIN
SET #SQL = ''
SELECT #SQL = '
INSERT INTO ##DataDictionary(TableName,ColumnName,SampleData)
SELECT '''+#TableName+''','''+#ColumnName+'''
,(SELECT TOP 1 '+QUOTENAME(#ColumnName)+' FROM '+QUOTENAME(#TableName)+' ORDER BY NEWID()) -- NewID randomly selects a sample row
'
print #SQL
EXEC (#SQL)
FETCH cur INTO #TableName,#ColumnName
END
CLOSE cur
DEALLOCATE cur
SELECT * from ##DataDictionary

Stored procedure to drop the column in SQL Server

I created many tables and I have noticed that I have created one useless column in all the tables. I want to create a stored procedure which will drop one specific column and can be useful in all the column.
I created this stored procedure but I'm getting an error. Help me please
You cannot parametrize table and column names with parameters - those are only valid for values - not for object names.
If this is a one-time operation, the simplest option would be to generate the ALTER TABLE ... DROP COLUMN ... statements in SSMS using this code:
SELECT
'ALTER TABLE ' + SCHEMA_NAME(t.schema_id) + '.' + t.Name +
' DROP COLUMN Phone;'
FROM
sys.tables t
and then execute this code in SSMS; the output from it is a list of statement which you can then copy & paste to a new SSMS window and execute.
If you really want to do this as a stored procedure, you can apply the same basic idea - and then just use code (a cursor) to iterate over the commands being generated, and executing them - something like this:
CREATE PROCEDURE dbo.DropColumnFromAllTables (#ColumnName NVARCHAR(100))
AS
BEGIN
DECLARE #SchemaName sysname, #TableName sysname
-- define cursor over all tables which contain this column in question
DECLARE DropCursor CURSOR LOCAL FAST_FORWARD
FOR
SELECT
SchemaName = s.Name,
TableName = t.Name
FROM
sys.tables t
INNER JOIN
sys.schemas s ON t.schema_id = s.schema_id
WHERE
EXISTS (SELECT * FROM sys.columns c
WHERE c.object_id = t.object_id
AND c.Name = #ColumnName);
-- open cursor and start iterating over the tables found
OPEN DropCursor
FETCH NEXT FROM DropCursor INTO #SchemaName, #TableName
WHILE ##FETCH_STATUS = 0
BEGIN
DECLARE #Stmt NVARCHAR(1000)
-- generate the SQL statement
SET #Stmt = N'ALTER TABLE [' + #SchemaName + '].[' + #TableName + '] DROP COLUMN [' + #ColumnName + ']';
-- execute that SQL statement
EXEC sp_executeSql #Stmt
FETCH NEXT FROM DropCursor INTO #SchemaName, #TableName
END
CLOSE DropCursor
DEALLOCATE DropCursor
END
This procedure should work.
It loops through all cols and then deletes the column where sum(col) is zero.
Take a Backup of the Table
alter procedure deletecolumnsifzero #tablename varchar(1000)
as
set nocount on
declare #n int
declare #sql nvarchar(1000)
declare #sum_cols nvarchar(1000)
declare #c_id nvarchar(100)
set #n = 0
declare c1 cursor for
select column_name from information_schema.columns
where
table_name like #tablename
--Cursor Starts
open c1
fetch next from c1
into #c_id
while ##fetch_status = 0
begin
set #sql=''
set #sql='select #sum_cols = sum('+#c_id+') from ['+#tablename+']'
exec sp_Executesql #sql,N'#sum_cols int out,#tablename nvarchar(100)',#sum_cols out,#tablename
if(#sum_cols = 0)
begin
set #n=#n+1
set #sql=''
set #sql= #sql+'alter table ['+#tablename+'] drop column ['+#c_id+']'
exec sp_executesql #sql
end
fetch next from c1
into #c_id
end
close c1
deallocate c1

Spool Results from Cursor Into Table

I have a cursor that I'm using to find NULL columns in a database. I'm using this to eliminate NULL Columns from an upload of this data to Salesforce using dbAMP. I'd like to modify this to spool the results into a Table and include the Table Name and Column name.
declare #col varchar(255), #cmd varchar(max)
DECLARE getinfo cursor for
SELECT c.name FROM sys.tables t JOIN sys.columns c ON t.Object_ID =
c.Object_ID
WHERE t.Name = 'Account'
OPEN getinfo
FETCH NEXT FROM getinfo into #col
WHILE ##FETCH_STATUS = 0
BEGIN
SELECT #cmd = 'IF NOT EXISTS (SELECT top 1 * FROM Account WHERE [' + #col +
'] IS NOT NULL) BEGIN print ''' + #col + ''' end'
EXEC(#cmd)
FETCH NEXT FROM getinfo into #col
END
CLOSE getinfo
DEALLOCATE getinfo
I've have not had any success in modifying this cursor to put results in a table. Any guidance would be appreciated.
Make the Print a Select then Insert into (tbl with same column definition).
Create a table with the same columns in the same order.
Then put an Insert into yourtable(your columns in the same order as output from the exec().
Any change in table columns in the future may break this. The table and the query should have the same columns. If you are cautious and control the order of columns in the select and insert, it shouldn't matter about the table column order, but it is still good practice imho.
Example (insert into table with dynamic sql)
if object_id('dbo.ColumnMatch','U') is not null drop table dbo.ColumnMatch;
create table dbo.ColumnMatch (
id int identity(1,1) not null primary key
,column_name varchar(512)
);
declare #col varchar(256) = 'This Column Name'
declare #s varchar(max) = 'select ''' + #col + '''';
insert into ColumnMatch (column_name)
exec(#s);
select * from ColumnMatch;
Not Print but select and fix the Insert Into statement. :)
if object_id('dbo.ColumnMatch','U') is not null drop table dbo.ColumnMatch;
create table dbo.ColumnMatch (
id int identity(1,1) not null primary key
,column_name varchar(512)
);
declare #col varchar(255), #cmd varchar(max)
DECLARE getinfo cursor for
SELECT c.name FROM sys.tables t JOIN sys.columns c ON t.Object_ID =
c.Object_ID
WHERE t.Name = 'Account'
OPEN getinfo
FETCH NEXT FROM getinfo into #col
WHILE ##FETCH_STATUS = 0
BEGIN
SELECT #cmd = 'IF NOT EXISTS (SELECT top 1 * FROM Account WHERE [' + #col +
'] IS NOT NULL) BEGIN select ''' + #col + ''' column_name end'
Insert into ColumnMatch (column_name)
EXEC(#cmd)
FETCH NEXT FROM getinfo into #col
END
CLOSE getinfo
DEALLOCATE getinfo
select * from ColumnMatch;

How can I search multiple fields and count nulls for all?

Is there an easy way to count nulls in all fields in a table without writing 40+ very similar, but slightly different, queries? I would think there is some kind of statistics maintained for all tables, and this may be the easiest way to go with it, but I don't know for sure. Thoughts, anyone? Thanks!!
BTW, I am using SQL Server 2008.
Not sure if you consider this simple or not, but this will total the NULLs by column in a table.
DECLARE #table sysname;
SET #table = 'MyTable'; --replace this with your table name
DECLARE #colname sysname;
DECLARE #sql NVARCHAR(MAX);
DECLARE COLS CURSOR FOR
SELECT c.name
FROM sys.tables t
INNER JOIN sys.columns c ON t.object_id = c.object_id
WHERE t.name = #table;
SET #sql = 'SELECT ';
OPEN COLS;
FETCH NEXT FROM COLS INTO #colname;
WHILE ##FETCH_STATUS = 0
BEGIN
SET #sql = #sql + 'COUNT(CASE WHEN ' + #colname + ' IS NULL THEN 1 END) AS ' + #colname + '_NULLS,'
FETCH NEXT FROM COLS INTO #colname;
END;
CLOSE COLS;
DEALLOCATE COLS;
SET #sql = LEFT(#sql,LEN(#sql) - 1) --trim tailing ,
SET #sql = #sql + ' FROM ' + #table;
EXEC sp_executesql #sql;
SELECT COUNT( CASE WHEN field01 IS NULL THEN 1 END) +
COUNT( CASE WHEN field02 IS NULL THEN 1 END) +
...
COUNT( CASE WHEN field40 IS NULL THEN 1 END) as total_nulls
This answer will return a table containing the name of each column of a specified table. (#tab is the name of the table you're trying to count NULLs in.)
You can loop through the column names, count NULLs in each column, and add the result to a total running count.

Looping through a column in SQL table that contains names of other tables

I have fairly new to using SQL, currently I have a table that has a column that contains the names of all the tables I want to use for one query, so what I want to do is to loop through that column and go to every single one of these tables and then search one of their columns for a value (there could be multiple values), so whenever a table contains the value, I will list the name of the table. Could someone give me a hint of how this is done? Is cursor needed for this?
I don't have enough reputation to comment but is the table with the column that contain the table names all in one column, meaning that all the table names are comma separated or marked with some sort of separator? This would cause the query to be a little more complicated as you would have to take care of that before you start looping through your table.
However, this would require a cursor, as well as some dynamic sql.
I will give a basic example of how you can go about this.
declare #value varchar(50)
declare #tableName varchar(50)
declare #sqlstring nvarchar(100)
set #value = 'whateveryouwant'
declare #getTableName = cursor for
select tableName from TablewithTableNames
OPEN #getTableName
fetch NEXT
from #getTableName into #tableName
while ##FETCH_STATUS = 0
BEGIN
set #sqlstring = 'Select Count(*) from ' + #tableName + 'where ColumnNameYouwant = ' + #value
exec #sqlstring
If ##ROWcount > 0
insert into #temptable values (#tableName)
fetch next
from #getTableName into #tableName
END
select * from #temptable
drop table #temptable
close #getTableName
deallocate #getTableName
I'm currently not able to test this out as for time constraint reasons, but this is how I would go about doing this.
You could try something like this:
--Generate dynamic SQL
DECLARE #TablesToSearch TABLE (
TableName VARCHAR(50));
INSERT INTO #TablesToSearch VALUES ('invoiceTbl');
DECLARE #SQL TABLE (
RowNum INT,
SQLText VARCHAR(500));
INSERT INTO
#SQL
SELECT
ROW_NUMBER() OVER (ORDER BY ts.TableName) AS RowNum,
'SELECT * FROM ' + ts.TableName + ' WHERE ' + c.name + ' = 1;'
FROM
#TablesToSearch ts
INNER JOIN sys.tables t ON t.name = ts.TableName
INNER JOIN sys.columns c ON c.object_id = t.object_id;
--Now run the queries
DECLARE #Count INT;
SELECT #Count = COUNT(*) FROM #SQL;
WHILE #Count > 0
BEGIN
DECLARE #RowNum INT;
DECLARE #SQLText VARCHAR(500);
SELECT TOP 1 #RowNum = RowNum, #SQLText = SQLText FROM #SQL;
EXEC (#SQLText);
DELETE FROM #SQL WHERE RowNum = #RowNum;
SELECT #Count = COUNT(*) FROM #SQL;
END;
You would need to change the "1" I am using as an example to the value you are looking for and probably add a CONVERT/ CAST to make sure the column is the right data type?
You actually said that you wanted the name of the table, so you would need to change the SQL to:
'SELECT ''' + ts.TableName + ''' FROM ' + ts.TableName + ' WHERE ' + c.name + ' = 1;'
Another thought, it would probably be best to insert the results from this into a temporary table so you can dump out the results in one go at the end?