SQL Find occurrences of value in a table regardless of columns - sql

I was recently asked in a interview how would I get count of apples, bananas and oranges in a table regardless of column information. The interviewer asked to provide counts of occurrences of apples and bananas and skip oranges..
I have never done a query before without column name, please help..
thx

Credits of this approach goes to #GordonLinoff. Assume that you have 5 columns in the table:
select col_value, count(*) cnt from
(select (case when cols.num = 1 then t.col_1
when cols.num = 2 then t.col_2
when cols.num = 3 then t.col_3
when cols.num = 4 then t.col_4
when cols.num = 5 then t.col_5
end) as col_value
from table t cross join
(select level as num from dual connect by level <= 5) cols)
where col_value in ('Apple', 'Banana')
group by col_value
order by 1;
The table will be full-scanned but only once, thus it is more effective than UNION ALL of all column combinations. Also you can rewrite this query using column information from data dictionary and dynamic SQL so it would be applicable to any table and any number of columns.

Here is the way I would approach the issue. Note I used a table in the database I have the misfortune to support. Also this would need some modification and then one would need to query the table where rows were inserted.
declare #columnName nvarchar(128)
declare #command nvarchar(250)
--drop table interview_idiocy
create table interview_idiocy
(
Column_name varchar(128),
Fruit varchar(50)
)
declare interview_idiocy cursor for
select
column_name
from
information_schema.columns
where
table_name = 'People'
AND data_type in ('varchar', 'char')
open interview_idiocy
fetch next from interview_idiocy into #columnName
WHILE ##FETCH_STATUS = 0
Begin
set #command = 'insert interview_idiocy select count(' + #columnName +'),' + #columnName + ' from people where ' + #columnName + ' = ''apple'' group by ' + #columnName
exec sp_executesql #command
print #command
fetch next from interview_idiocy into #columnName
end
close interview_idiocy
deallocate interview_idiocy

Related

Select from a table only the columns that are not empty?

I have a table with hundreds of columns:
------------------------------------------------
ID ColA ColB ColC Col D ... ColZZZ
------------------------------------------------
1 bla
2 foo
3 bar
4 baz
------------------------------------------------
I need to know which columns have no values in them (that is: which are empty '' not NULL)
I could create a query for every column:
select count(1) from [table] where [ColA] <> ''; -- returns 2, so not, not empty
select count(1) from [table] where [ColB] <> ''; -- returns 1, so no
select count(1) from [table] where [ColC] <> ''; -- returns 0, so yay! found and empty one
...
etc
But there has to be an easier way for this?
Is there a way to return [table] without the empty columns, in other words:
----------------------------
ID ColA ColB ColZZZ
----------------------------
1 bla
2 foo
3 bar
4 baz
----------------------------
Here is solution to it. I used this query before too search for empty columns across all tables. Slightly modified now to search for non-empty, it might have few extra parts not needed in you example.
You create a temp table to store column names that are not empty, and use cursor to create dynamic sql to search for them.
In the end, just generate another dynamic sql to select columns based on temp table results.
IF (OBJECT_ID('tempdb..#tmpRez') IS NOT NULL) DROP TABLE #tmpRez;
CREATE TABLE #tmpRez (TableName sysname, ColName sysname);
DECLARE crs CURSOR LOCAL FAST_FORWARD FOR
SELECT t.name, c.name FROM sys.tables t
INNER JOIN sys.columns c ON c.object_id=t.object_id
WHERE 1=1
AND t.name = 'Table1' -- OR your own condition
OPEN crs;
DECLARE #tbl sysname;
DECLARE #col sysname;
DECLARE #sql NVARCHAR(MAX);
FETCH NEXT FROM crs INTO #tbl,#col;
WHILE ##FETCH_STATUS = 0
BEGIN
SELECT #sql = 'IF EXISTS (SELECT * FROM ['+ #tbl+'] WHERE [' + #col + '] <> '''') INSERT INTO #tmpRez SELECT ''' + #tbl +''','''+ #col + '''';
EXEC(#sql);
FETCH NEXT FROM crs INTO #tbl,#col;
END;
CLOSE crs;
DEALLOCATE crs;
SELECT #sql = 'SELECT ' + STUFF((SELECT ',' + ColName FROM #tmpRez x
where x.TableName = y.TableName
FOR XML PATH ('')), 1, 1, '') + ' FROM ' + TableName
FROM #tmpRez y GROUP BY TableName
EXEC (#sql)
How about this to return the table with no empty columns:
SELECT * from table
WHERE column IS NOT NULL AND TRIM(column) <> ''
This to return the table with empty columns:
SELECT * from table
WHERE column IS NULL AND TRIM(column) = ''

How to count the amount of data in each column in a table

I have a table in SQL Server that has 490 columns and today I need to add some more. I have an API that populates this table from an external system and currently, it is taking about 16 hours to sync as there are ~550,000 rows in the said table. I need to count the number of rows in use in each column to see if there are anywhere I can remove.
I've looked into this issue for a little time now and resorted to posting here as a last-ditch effort. I have tried a few different ways but nothing is quite hitting what I need. I know I could go through and do COUNT(column_name) but there are 490 columns and this isn't really feasible.
So I am currently using the sys.columns table to get a list of the rows in the said table, and then using an outer apply using COUNT(*) from table. This is kinda working but obviously just returning me the total amount of rows in the table again each row.
I think I need to replace the Count(*) with a COUNT(sys.columns.name) but that doesn't work either, it returns a "Aggregates on the right side of an APPLY cannot reference columns from the left side." error.
The code I feel is currently closest is as follows but I could be a million miles away.
SELECT
name as 'Column',
Counter.total
FROM sys.columns WITH (NOLOCK)
OUTER APPLY
(
SELECT TOP 1
COUNT(*) as total
FROM lead WITH (nolock)
) as Counter
WHERE sys.columns.object_id = 544720993
This throws back the following -
Column | total
______________________
Column1 | 512345
Column2 | 512345
Column3 | 512345
Column4 | 512345
Column5 | 512345
However, in an ideal world, I would like the following
Column | total
______________________
Column1 | 512345 --(meaning no nulls in this column)
Column2 | 435765 --(mean some nulls in this column)
Column3 | 123423
Column4 | 76 --(meaning only 73 non nulls on this column)
Column5 | 0 --(meaning every row is null in this column)
Thank you for your time!
Sample Data
CREATE TABLE [dbo].[Tp](
[a] [char](2) NULL,
[b] [char](2) NULL,
[c] [char](2) NULL
) ON [PRIMARY]
GO
INSERT INTO [Tp] ([a],[b],[c])VALUES('a','a','a')
INSERT INTO [Tp] ([a],[b],[c])VALUES('1','1','1')
INSERT INTO [Tp] ([a],[b],[c])VALUES('2','2','2')
INSERT INTO [Tp] ([a],[b],[c])VALUES(NULL,'9',NULL)
INSERT INTO [Tp] ([a],[b],[c])VALUES('3','3','3')
INSERT INTO [Tp] ([a],[b],[c])VALUES('4','4','4')
INSERT INTO [Tp] ([a],[b],[c])VALUES(NULL,NULL,NULL)
INSERT INTO [Tp] ([a],[b],[c])VALUES(NULL,'7',NULL)
INSERT INTO [Tp] ([a],[b],[c])VALUES(NULL,NULL,NULL)
INSERT INTO [Tp] ([a],[b],[c])VALUES('8','8','8')
INSERT INTO [Tp] ([a],[b],[c])VALUES('9','9','9')
INSERT INTO [Tp] ([a],[b],[c])VALUES(NULL,NULL,NULL)
INSERT INTO [Tp] ([a],[b],[c])VALUES('','','')
INSERT INTO [Tp] ([a],[b],[c])VALUES('','','')
INSERT INTO [Tp] ([a],[b],[c])VALUES('','5','')
INSERT INTO [Tp] ([a],[b],[c])VALUES('2','','')
SELECT * FROM [Tp]
Dynamic Sql script to get the expected result
DECLARE #ColumnCount nvarchar(max),
#Sql nvarchar(max)
SELECT #Sql = STUFF((SELECT ' UNION ALL '+ ' '+'SELECT '''+TABLE_NAME+''' AS TABLE_NAME,'+''''+COLUMN_NAME+''''+' AS ColumName'+',SUM(CASE WHEN '+COLUMN_NAME+' IS NULL THEN 1 ELSE 0 END) As Countof_nulls
,SUM(CASE WHEN ISNULL(NULLIF('+COLUMN_NAME+',''''),''1'')=''1'' THEN 1 ELSE 0 END) As CountOf_EmptySpace
,COUNT('+COLUMN_NAME+') As Count_not_nulls
FROM '+TABLE_NAME
FROM
INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME ='Tp' --Enter your table in the query
FOR XML PATH (''), TYPE).value('.', 'VARCHAR(MAX)'),1,10,'')
EXEC (#Sql)
Result
TABLE_NAME ColumName Countof_nulls CountOf_EmptySpace Count_not_nulls
***************************************************************************
Tp a 5 9 11
Tp b 3 7 13
Tp c 5 10 11
You can use a cursor with dynamic SQL that inserts each COUNT check on a temporary table.
You can control the schema, tables and columns to check with the cursor's SELECT.
IF OBJECT_ID('tempdb..#ColumnResults') IS NOT NULL
DROP TABLE #ColumnResults
CREATE TABLE #ColumnResults (
SchemaName VARCHAR(100),
TableName VARCHAR(100),
ColumnName VARCHAR(100),
TotalRows INT,
NotNullAmount INT)
DECLARE #SchemaName VARCHAR(100)
DECLARE #TableName VARCHAR(100)
DECLARE #ColumnName VARCHAR(100)
DECLARE ColumnCursor CURSOR FOR
SELECT
QUOTENAME(T.TABLE_SCHEMA),
QUOTENAME(T.TABLE_NAME),
QUOTENAME(T.COLUMN_NAME)
FROM
INFORMATION_SCHEMA.COLUMNS AS T
WHERE
T.TABLE_NAME = 'YourTableName' AND -- Filter here the table you want to check
T.TABLE_SCHEMA = 'YourTableSchema' -- Filter here the schema you want to check
ORDER BY
T.TABLE_SCHEMA,
T.TABLE_NAME,
T.COLUMN_NAME
OPEN ColumnCursor
FETCH NEXT FROM ColumnCursor INTO
#SchemaName,
#TableName,
#ColumnName
WHILE ##FETCH_STATUS = 0
BEGIN
DECLARE #DynamicSQL VARCHAR(MAX) = '
INSERT INTO #ColumnResults (
SchemaName,
TableName,
ColumnName,
TotalRows,
NotNullAmount)
SELECT
SchemaName = ''' + #SchemaName + ''',
TableName = ''' + #TableName + ''',
ColumnName = ''' + #ColumnName + ''',
TotalRows = COUNT(1),
NotNullAmount = COUNT(' + #ColumnName + ')
FROM
' + #SchemaName + '.' + #TableName + ' AS T'
-- PRINT (#DynamicSQL)
EXEC (#DynamicSQL)
FETCH NEXT FROM ColumnCursor INTO
#SchemaName,
#TableName,
#ColumnName
END
CLOSE ColumnCursor
DEALLOCATE ColumnCursor
SELECT
C.*
FROM
#ColumnResults AS C
ORDER BY
C.SchemaName,
C.TableName,
C.ColumnName
You can comment the EXEC and uncomment the PRINT to check for the dynamic SQL created before executing.
Note that this will actually execute one SELECT for each column instead of a SELECT for all columns in a table. You could tamper the dynamic SQL a little so it works once per table while checking all columns, but I find this approach tidier and capable of working across schemas and tables on the same manner.

How can I find potential not null columns?

I'm working with a SQL Server database which is very light on constraints and want to apply some not null constraints. Is there any way to scan all nullable columns in the database and select which ones do not contain any nulls or even better count the number of null values?
Perhaps with a little dynamic SQL
Example
Declare #SQL varchar(max) = '>>>'
Select #SQL = #SQL
+ 'Union All Select TableName='''+quotename(Table_Schema)+'.'+quotename(Table_Name)+''''
+',ColumnName='''+quotename(Column_Name)+''''
+',NullValues=count(*)'
+' From '+quotename(Table_Schema)+'.'+quotename(Table_Name)
+' Where '+quotename(Column_Name)+' is null '
From INFORMATION_SCHEMA.COLUMNS
Where Is_Nullable='YES'
Select #SQL='Select * from (' + replace(#SQL,'>>>Union All ','') + ') A Where NullValues>0'
Exec(#SQL)
Returns (for example)
TableName ColumnName NullValues
[dbo].[OD-Map] [Map-Val2] 185
[dbo].[OD-Map] [Map-Val3] 225
[dbo].[OD-Map] [Map-Val4] 225
For all table/columns with counts >= 0
...
Select #SQL=replace(#SQL,'>>>Union All ','')
Exec(#SQL)
Check this query. This was originally written by Linda Lawton
Original Article: https://www.daimto.com/sql-server-finding-columns-with-null-values
Finding columns with null values in your Database - Find Nulls Script
set nocount on
declare #columnName nvarchar(500)
declare #tableName nvarchar(500)
declare #select nvarchar(500)
declare #sql nvarchar(500)
-- check if the Temp table already exists
if OBJECT_ID('tempdb..#LocalTempTable') is null
Begin
CREATE TABLE #LocalTempTable(
TableName varchar(150),
ColumnName varchar(150))
end
else
begin
truncate table #LocalTempTable;
end
-- Build a select for each of the columns in the database. That checks for nulls
DECLARE check_cursor CURSOR FOR
select column_name, table_name, concat(' Select ''',column_name,''',''',table_name,''' from ',table_name,' where [',COLUMN_NAME,'] is null')
from INFORMATION_SCHEMA.COLUMNS
OPEN check_cursor
FETCH NEXT FROM check_cursor
INTO #columnName, #tableName,#select
WHILE ##FETCH_STATUS = 0
BEGIN
-- Insert it if there if it exists.
set #sql = 'insert into #LocalTempTable (ColumnName, TableName)' + #select
print #sql
-- Run the statment
exec( #sql)
FETCH NEXT FROM check_cursor
INTO #columnName, #tableName,#select
end
CLOSE check_cursor;
DEALLOCATE check_cursor;
SELECT TableName, ColumnName, COUNT(TableName) 'Count'
FROM #LocalTempTable
GROUP BY TableName, ColumnName
ORDER BY TableName
The query result would be something like this.
This will tell you which columns in your database are currently NULLABLE.
USE <Your_DB_Name>
GO
SELECT o.name AS Table_Name
, c.name AS Column_Name
FROM sys.objects o
INNER JOIN sys.columns c ON o.object_id = c.object_id
AND c.is_nullable = 1 /* 1 = NULL, 0 = NOT NULL */
WHERE o.type_desc = 'USER_TABLE'
AND o.type NOT IN ('PK','F','D') /* NOT Primary, Foreign of Default Key */
Yes, it is fairly straight forward. Note: if the table contains a lot of records, I suggest using SELECT TOP 1000 *, instead of SELECT *.
-- Identify records where a specific column is NOT NULL
SELECT *
FROM TableName
WHERE ColumNName IS NOT NULL
-- Identify the count of records where a specific column contains NULL
SELECT COUNT(1)
FROM TableName
WHERE ColumNName IS NULL
-- Identify all NULLable columns in a database
SELECT *
FROM INFORMATION_SCHEMA.COLUMNS
WHERE IS_NULLABLE = 'YES'
For more information on the INFORMATION_SCHEMA views, see this: https://learn.microsoft.com/en-us/sql/relational-databases/system-information-schema-views/system-information-schema-views-transact-sql
If you want to scan all tables and columns in a given database for NULLs, then it is a two step process.
1.) Get the list of tables and columns that are NULLABLE.
-- Identify all NULLable columns in a database
SELECT TABLE_CATALOG, TABLE_SCHEMA, TABLE_NAME, COLUMN_NAME
FROM INFORMATION_SCHEMA.COLUMNS
WHERE IS_NULLABLE = 'YES'
2.) Use Excel to create a SELECT statement to get the NULL counts for each table/column. To do this, copy and paste the query results from step 1 into EXCEL. Assuming you have copied the header row, then your data starts on row 2. In cell E2, enter the following formula.
="SELECT COUNT(1) FROM "&A2&"."&B2&"."&C2&" WHERE "&D2&" IS NULL"
Copy and paste that down the entire sheet. This will generate the SQL SELECT statement that you require. Copy the results in column E and paste into SQL Server and run it. This may take a while depending on the number of tables/columns to scan.

How can I search multiple fields and count nulls for all?

Is there an easy way to count nulls in all fields in a table without writing 40+ very similar, but slightly different, queries? I would think there is some kind of statistics maintained for all tables, and this may be the easiest way to go with it, but I don't know for sure. Thoughts, anyone? Thanks!!
BTW, I am using SQL Server 2008.
Not sure if you consider this simple or not, but this will total the NULLs by column in a table.
DECLARE #table sysname;
SET #table = 'MyTable'; --replace this with your table name
DECLARE #colname sysname;
DECLARE #sql NVARCHAR(MAX);
DECLARE COLS CURSOR FOR
SELECT c.name
FROM sys.tables t
INNER JOIN sys.columns c ON t.object_id = c.object_id
WHERE t.name = #table;
SET #sql = 'SELECT ';
OPEN COLS;
FETCH NEXT FROM COLS INTO #colname;
WHILE ##FETCH_STATUS = 0
BEGIN
SET #sql = #sql + 'COUNT(CASE WHEN ' + #colname + ' IS NULL THEN 1 END) AS ' + #colname + '_NULLS,'
FETCH NEXT FROM COLS INTO #colname;
END;
CLOSE COLS;
DEALLOCATE COLS;
SET #sql = LEFT(#sql,LEN(#sql) - 1) --trim tailing ,
SET #sql = #sql + ' FROM ' + #table;
EXEC sp_executesql #sql;
SELECT COUNT( CASE WHEN field01 IS NULL THEN 1 END) +
COUNT( CASE WHEN field02 IS NULL THEN 1 END) +
...
COUNT( CASE WHEN field40 IS NULL THEN 1 END) as total_nulls
This answer will return a table containing the name of each column of a specified table. (#tab is the name of the table you're trying to count NULLs in.)
You can loop through the column names, count NULLs in each column, and add the result to a total running count.

Selecting column names that have specified value

We are receiving rather large files, of which we have no control over the format of, that are being bulk-loaded into a SQL Server table via SSIS to be later imported into our internal structure. These files can contain over 800 columns, and often the column names are not immediately recognizable.
As a result, we have a large table that represents the contents of the file with over 800 Varchar columns.
The problem is: I know what specific values I'm looking for in this data, but I do not know what column contains it. And eyeballing the data to find said column is neither efficient nor ideal.
My question is: is it at all possible to search a table by some value N and return the column names that have that value? I'd post some code that I've tried, but I really don't know where to start on this one... or if it's even possible.
For example:
A B C D E F G H I J K L M N ...
------------------------------------------------------------
'a' 'a' 'a' 'a' 'a' 'b' 'a' 'a' 'a' 'b' 'b' 'a' 'a' 'c' ...
If I were to search this table for the value 'b', I would want to get back the following results:
Columns
---------
F
J
K
Is something like this possible to do?
This script will search all tables and all string columns for a specific string. You might be able to adapt this for your needs:
DECLARE #tableName sysname
DECLARE #columnName sysname
DECLARE #value varchar(100)
DECLARE #sql varchar(2000)
DECLARE #sqlPreamble varchar(100)
SET #value = 'EDUQ4' -- *** Set this to the value you're searching for *** --
SET #sqlPreamble = 'IF EXISTS (SELECT 1 FROM '
DECLARE theTableCursor CURSOR FAST_FORWARD FOR
SELECT TABLE_NAME FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_SCHEMA = 'dbo' AND TABLE_TYPE = 'BASE TABLE'
AND TABLE_NAME NOT LIKE '%temp%' AND TABLE_NAME != 'dtproperties' AND TABLE_NAME != 'sysdiagrams'
ORDER BY TABLE_NAME
OPEN theTableCursor
FETCH NEXT FROM theTableCursor INTO #tableName
WHILE ##FETCH_STATUS = 0 -- spin through Table entries
BEGIN
DECLARE theColumnCursor CURSOR FAST_FORWARD FOR
SELECT COLUMN_NAME FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = #tableName AND (DATA_TYPE = 'nvarchar' OR DATA_TYPE = 'varchar')
ORDER BY ORDINAL_POSITION
OPEN theColumnCursor
FETCH NEXT FROM theColumnCursor INTO #columnName
WHILE ##FETCH_STATUS = 0 -- spin through Column entries
BEGIN
SET #sql = #tableName + ' WHERE ' + #columnName + ' LIKE ''' + #value +
''') PRINT ''Value found in Table: ' + #tableName + ', Column: ' + #columnName + ''''
EXEC (#sqlPreamble + #sql)
FETCH NEXT FROM theColumnCursor INTO #columnName
END
CLOSE theColumnCursor
DEALLOCATE theColumnCursor
FETCH NEXT FROM theTableCursor INTO #tableName
END
CLOSE theTableCursor
DEALLOCATE theTableCursor
One option you have is to be a little creative using XML in SQL Server.
Turn a row at a time into XML using cross apply and query for the nodes that has a certain value in a second cross apply.
Finally you output the distinct list of node names.
declare #Value nvarchar(max)
set #Value= 'b'
select distinct T3.X.value('local-name(.)', 'nvarchar(128)') as ColName
from YourTable as T1
cross apply (select T1.* for xml path(''), type) as T2(X)
cross apply T2.X.nodes('*[text() = sql:variable("#Value")]') as T3(X)
SQL Fiddle
If you have access to the files are RegEx will be must faster than performing a generic search in SQL.
If you are forced to use SQL #pmbAustin's answer is the way to go. Be warned, it won't run quickly.