Selecting column names that have specified value - sql

We are receiving rather large files, of which we have no control over the format of, that are being bulk-loaded into a SQL Server table via SSIS to be later imported into our internal structure. These files can contain over 800 columns, and often the column names are not immediately recognizable.
As a result, we have a large table that represents the contents of the file with over 800 Varchar columns.
The problem is: I know what specific values I'm looking for in this data, but I do not know what column contains it. And eyeballing the data to find said column is neither efficient nor ideal.
My question is: is it at all possible to search a table by some value N and return the column names that have that value? I'd post some code that I've tried, but I really don't know where to start on this one... or if it's even possible.
For example:
A B C D E F G H I J K L M N ...
------------------------------------------------------------
'a' 'a' 'a' 'a' 'a' 'b' 'a' 'a' 'a' 'b' 'b' 'a' 'a' 'c' ...
If I were to search this table for the value 'b', I would want to get back the following results:
Columns
---------
F
J
K
Is something like this possible to do?

This script will search all tables and all string columns for a specific string. You might be able to adapt this for your needs:
DECLARE #tableName sysname
DECLARE #columnName sysname
DECLARE #value varchar(100)
DECLARE #sql varchar(2000)
DECLARE #sqlPreamble varchar(100)
SET #value = 'EDUQ4' -- *** Set this to the value you're searching for *** --
SET #sqlPreamble = 'IF EXISTS (SELECT 1 FROM '
DECLARE theTableCursor CURSOR FAST_FORWARD FOR
SELECT TABLE_NAME FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_SCHEMA = 'dbo' AND TABLE_TYPE = 'BASE TABLE'
AND TABLE_NAME NOT LIKE '%temp%' AND TABLE_NAME != 'dtproperties' AND TABLE_NAME != 'sysdiagrams'
ORDER BY TABLE_NAME
OPEN theTableCursor
FETCH NEXT FROM theTableCursor INTO #tableName
WHILE ##FETCH_STATUS = 0 -- spin through Table entries
BEGIN
DECLARE theColumnCursor CURSOR FAST_FORWARD FOR
SELECT COLUMN_NAME FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = #tableName AND (DATA_TYPE = 'nvarchar' OR DATA_TYPE = 'varchar')
ORDER BY ORDINAL_POSITION
OPEN theColumnCursor
FETCH NEXT FROM theColumnCursor INTO #columnName
WHILE ##FETCH_STATUS = 0 -- spin through Column entries
BEGIN
SET #sql = #tableName + ' WHERE ' + #columnName + ' LIKE ''' + #value +
''') PRINT ''Value found in Table: ' + #tableName + ', Column: ' + #columnName + ''''
EXEC (#sqlPreamble + #sql)
FETCH NEXT FROM theColumnCursor INTO #columnName
END
CLOSE theColumnCursor
DEALLOCATE theColumnCursor
FETCH NEXT FROM theTableCursor INTO #tableName
END
CLOSE theTableCursor
DEALLOCATE theTableCursor

One option you have is to be a little creative using XML in SQL Server.
Turn a row at a time into XML using cross apply and query for the nodes that has a certain value in a second cross apply.
Finally you output the distinct list of node names.
declare #Value nvarchar(max)
set #Value= 'b'
select distinct T3.X.value('local-name(.)', 'nvarchar(128)') as ColName
from YourTable as T1
cross apply (select T1.* for xml path(''), type) as T2(X)
cross apply T2.X.nodes('*[text() = sql:variable("#Value")]') as T3(X)
SQL Fiddle

If you have access to the files are RegEx will be must faster than performing a generic search in SQL.
If you are forced to use SQL #pmbAustin's answer is the way to go. Be warned, it won't run quickly.

Related

How can I find potential not null columns?

I'm working with a SQL Server database which is very light on constraints and want to apply some not null constraints. Is there any way to scan all nullable columns in the database and select which ones do not contain any nulls or even better count the number of null values?
Perhaps with a little dynamic SQL
Example
Declare #SQL varchar(max) = '>>>'
Select #SQL = #SQL
+ 'Union All Select TableName='''+quotename(Table_Schema)+'.'+quotename(Table_Name)+''''
+',ColumnName='''+quotename(Column_Name)+''''
+',NullValues=count(*)'
+' From '+quotename(Table_Schema)+'.'+quotename(Table_Name)
+' Where '+quotename(Column_Name)+' is null '
From INFORMATION_SCHEMA.COLUMNS
Where Is_Nullable='YES'
Select #SQL='Select * from (' + replace(#SQL,'>>>Union All ','') + ') A Where NullValues>0'
Exec(#SQL)
Returns (for example)
TableName ColumnName NullValues
[dbo].[OD-Map] [Map-Val2] 185
[dbo].[OD-Map] [Map-Val3] 225
[dbo].[OD-Map] [Map-Val4] 225
For all table/columns with counts >= 0
...
Select #SQL=replace(#SQL,'>>>Union All ','')
Exec(#SQL)
Check this query. This was originally written by Linda Lawton
Original Article: https://www.daimto.com/sql-server-finding-columns-with-null-values
Finding columns with null values in your Database - Find Nulls Script
set nocount on
declare #columnName nvarchar(500)
declare #tableName nvarchar(500)
declare #select nvarchar(500)
declare #sql nvarchar(500)
-- check if the Temp table already exists
if OBJECT_ID('tempdb..#LocalTempTable') is null
Begin
CREATE TABLE #LocalTempTable(
TableName varchar(150),
ColumnName varchar(150))
end
else
begin
truncate table #LocalTempTable;
end
-- Build a select for each of the columns in the database. That checks for nulls
DECLARE check_cursor CURSOR FOR
select column_name, table_name, concat(' Select ''',column_name,''',''',table_name,''' from ',table_name,' where [',COLUMN_NAME,'] is null')
from INFORMATION_SCHEMA.COLUMNS
OPEN check_cursor
FETCH NEXT FROM check_cursor
INTO #columnName, #tableName,#select
WHILE ##FETCH_STATUS = 0
BEGIN
-- Insert it if there if it exists.
set #sql = 'insert into #LocalTempTable (ColumnName, TableName)' + #select
print #sql
-- Run the statment
exec( #sql)
FETCH NEXT FROM check_cursor
INTO #columnName, #tableName,#select
end
CLOSE check_cursor;
DEALLOCATE check_cursor;
SELECT TableName, ColumnName, COUNT(TableName) 'Count'
FROM #LocalTempTable
GROUP BY TableName, ColumnName
ORDER BY TableName
The query result would be something like this.
This will tell you which columns in your database are currently NULLABLE.
USE <Your_DB_Name>
GO
SELECT o.name AS Table_Name
, c.name AS Column_Name
FROM sys.objects o
INNER JOIN sys.columns c ON o.object_id = c.object_id
AND c.is_nullable = 1 /* 1 = NULL, 0 = NOT NULL */
WHERE o.type_desc = 'USER_TABLE'
AND o.type NOT IN ('PK','F','D') /* NOT Primary, Foreign of Default Key */
Yes, it is fairly straight forward. Note: if the table contains a lot of records, I suggest using SELECT TOP 1000 *, instead of SELECT *.
-- Identify records where a specific column is NOT NULL
SELECT *
FROM TableName
WHERE ColumNName IS NOT NULL
-- Identify the count of records where a specific column contains NULL
SELECT COUNT(1)
FROM TableName
WHERE ColumNName IS NULL
-- Identify all NULLable columns in a database
SELECT *
FROM INFORMATION_SCHEMA.COLUMNS
WHERE IS_NULLABLE = 'YES'
For more information on the INFORMATION_SCHEMA views, see this: https://learn.microsoft.com/en-us/sql/relational-databases/system-information-schema-views/system-information-schema-views-transact-sql
If you want to scan all tables and columns in a given database for NULLs, then it is a two step process.
1.) Get the list of tables and columns that are NULLABLE.
-- Identify all NULLable columns in a database
SELECT TABLE_CATALOG, TABLE_SCHEMA, TABLE_NAME, COLUMN_NAME
FROM INFORMATION_SCHEMA.COLUMNS
WHERE IS_NULLABLE = 'YES'
2.) Use Excel to create a SELECT statement to get the NULL counts for each table/column. To do this, copy and paste the query results from step 1 into EXCEL. Assuming you have copied the header row, then your data starts on row 2. In cell E2, enter the following formula.
="SELECT COUNT(1) FROM "&A2&"."&B2&"."&C2&" WHERE "&D2&" IS NULL"
Copy and paste that down the entire sheet. This will generate the SQL SELECT statement that you require. Copy the results in column E and paste into SQL Server and run it. This may take a while depending on the number of tables/columns to scan.

How can I search multiple fields and count nulls for all?

Is there an easy way to count nulls in all fields in a table without writing 40+ very similar, but slightly different, queries? I would think there is some kind of statistics maintained for all tables, and this may be the easiest way to go with it, but I don't know for sure. Thoughts, anyone? Thanks!!
BTW, I am using SQL Server 2008.
Not sure if you consider this simple or not, but this will total the NULLs by column in a table.
DECLARE #table sysname;
SET #table = 'MyTable'; --replace this with your table name
DECLARE #colname sysname;
DECLARE #sql NVARCHAR(MAX);
DECLARE COLS CURSOR FOR
SELECT c.name
FROM sys.tables t
INNER JOIN sys.columns c ON t.object_id = c.object_id
WHERE t.name = #table;
SET #sql = 'SELECT ';
OPEN COLS;
FETCH NEXT FROM COLS INTO #colname;
WHILE ##FETCH_STATUS = 0
BEGIN
SET #sql = #sql + 'COUNT(CASE WHEN ' + #colname + ' IS NULL THEN 1 END) AS ' + #colname + '_NULLS,'
FETCH NEXT FROM COLS INTO #colname;
END;
CLOSE COLS;
DEALLOCATE COLS;
SET #sql = LEFT(#sql,LEN(#sql) - 1) --trim tailing ,
SET #sql = #sql + ' FROM ' + #table;
EXEC sp_executesql #sql;
SELECT COUNT( CASE WHEN field01 IS NULL THEN 1 END) +
COUNT( CASE WHEN field02 IS NULL THEN 1 END) +
...
COUNT( CASE WHEN field40 IS NULL THEN 1 END) as total_nulls
This answer will return a table containing the name of each column of a specified table. (#tab is the name of the table you're trying to count NULLs in.)
You can loop through the column names, count NULLs in each column, and add the result to a total running count.

Find table information where part of column matches char variable

I'm trying to find any tables with columns containing the word date somewhere in the column name.
All of my queries are either all or nothing: they return all tables in the DB, or no results at all.
When I run a query without the variable, it works, as seen here.
select *
from MyDB.INFORMATION_SCHEMA.COLUMNS
where column_name like '%date%'
However, I can't get it to work by using a variable.
declare #temp varchar = 'date'
select*
from MyDB.INFORMATION_SCHEMA.COLUMNS
where column_name like '%' + #temp + '%'
The reason I'd like to do this is because I need to run this on more than one DB (such as below), and I have to perform this several times (for more than just date), and I'd like the process to go more smoothly.
select *
from MyDB1.INFORMATION_SCHEMA.COLUMNS
where column_name like '%date%'
union all
select *
from MyDB2.INFORMATION_SCHEMA.COLUMNS
where column_name like '%date%'
union all
select *
from MyDB3.INFORMATION_SCHEMA.COLUMNS
where column_name like '%date%'
One query with cursor to loop through all the database on your server or you can explicitly loop through certain database
Query
DECLARE #DB_Name SYSNAME;
DECLARE #Sql NVARCHAR(MAX)= '';
DECLARE #cur CURSOR;
SET #Cur = CURSOR FOR
SELECT name
FROM sys.sysdatabases
--WHERE name IN ('DBName1', 'DBName2', 'DBName3'); --<-- uncomment this line and
-- specify the database names
OPEN #cur
FETCH NEXT FROM #Cur INTO #DB_Name
WHILE (##FETCH_STATUS = 0)
BEGIN
SET #Sql = N'
SELECT t.name
,c.name
FROM '+ QUOTENAME(#DB_Name) + '.sys.tables t
INNER JOIN ' + QUOTENAME(#DB_Name) + '.sys.columns c ON c.object_id = t.object_id
WHERE t.name LIKE ''%test%'''
EXEC(#Sql)
FETCH NEXT FROM #Cur INTO #DB_Name
END
CLOSE #cur
DEALLOCATE #cur
You can use what ever source of column info you prefer but the issue you are having is because of your variable declaration.
declare #temp varchar = 'date' is equivalent to
declare #temp varchar(1) = 'date' so
select #temp returns 'd' so you are getting any columns containing 'd'
when you declare your variable make sure it has a length that is sufficient to store the longest of the strings you will be searching for. Jason's answer will work too but it is because of the variable declaration not the source of the data.
I guess problem is with your variable declaration part.
By default any variable declared with datatype VARCHAR will be considered as VARCHAR(1) in SQLServer.
declare #temp varchar = 'date'
print #temp --d
So try declaring your variable as VARCHAR(4).
declare #temp varchar(4) = 'date'
print #temp --date

SQL Find occurrences of value in a table regardless of columns

I was recently asked in a interview how would I get count of apples, bananas and oranges in a table regardless of column information. The interviewer asked to provide counts of occurrences of apples and bananas and skip oranges..
I have never done a query before without column name, please help..
thx
Credits of this approach goes to #GordonLinoff. Assume that you have 5 columns in the table:
select col_value, count(*) cnt from
(select (case when cols.num = 1 then t.col_1
when cols.num = 2 then t.col_2
when cols.num = 3 then t.col_3
when cols.num = 4 then t.col_4
when cols.num = 5 then t.col_5
end) as col_value
from table t cross join
(select level as num from dual connect by level <= 5) cols)
where col_value in ('Apple', 'Banana')
group by col_value
order by 1;
The table will be full-scanned but only once, thus it is more effective than UNION ALL of all column combinations. Also you can rewrite this query using column information from data dictionary and dynamic SQL so it would be applicable to any table and any number of columns.
Here is the way I would approach the issue. Note I used a table in the database I have the misfortune to support. Also this would need some modification and then one would need to query the table where rows were inserted.
declare #columnName nvarchar(128)
declare #command nvarchar(250)
--drop table interview_idiocy
create table interview_idiocy
(
Column_name varchar(128),
Fruit varchar(50)
)
declare interview_idiocy cursor for
select
column_name
from
information_schema.columns
where
table_name = 'People'
AND data_type in ('varchar', 'char')
open interview_idiocy
fetch next from interview_idiocy into #columnName
WHILE ##FETCH_STATUS = 0
Begin
set #command = 'insert interview_idiocy select count(' + #columnName +'),' + #columnName + ' from people where ' + #columnName + ' = ''apple'' group by ' + #columnName
exec sp_executesql #command
print #command
fetch next from interview_idiocy into #columnName
end
close interview_idiocy
deallocate interview_idiocy

Search sql database for a column name, then search for a value within the retuned columns

This query will search a database for a specific column name. I would like to go one step further and search the returned columns for a specific value.
SELECT t.name AS table_name,
SCHEMA_NAME(schema_id) AS schema_name,
c.name AS column_name,
FROM sys.tables AS t
INNER JOIN sys.columns c ON t.OBJECT_ID = c.OBJECT_ID
WHERE c.name LIKE '%Example%'
Any ideas?
Many thanks
For example, I have a database named Organisation. I have more than one table where tax_id column is present.
Most of the time, we have to find such a column from the whole database.
The solution is provided below:
select table_name,column_name from information_schema.columns
where column_name like '%tax%'
There is no matter in query to database name which ever you just need to change willing Column Name and will found required result
Search any value Like computer in whole database in which column and in which tables value computer exists
For it first we need to write a store procedure then we reuse it for our search i got it from http://vyaskn.tripod.com/search_all_columns_in_all_tables.htm very perfect result.
after executing store procedure we got required result as in given below image.
Image showing complete search result of keyword computer from whole database.
Above was concept to solve it.Exact Query fullfilling above requirment is below
Select tax_id from (select table_name from information_schema.columns
where column_name = 'tax_id') as temp
There is not such system table present for this kind of searching. Whereas you can try this for your purpose
DECLARE #ValueToSearch NVARCHAR(500)
DECLARE #SearchColumn NVARCHAR(100)
DECLARE #TableName NVARCHAR(200)
DECLARE #ColumnName NVARCHAR(200)
SET #ValueToSearch ='YOUR VALUE TP SEARCH'
SET #SearchColumn = 'YOUR COLUMN'
DECLARE #getResult CURSOR
SET #getResult = CURSOR FOR
SELECT t.name AS table_name,c.name AS column_name FROM sys.tables AS t INNER JOIN sys.columns c ON t.OBJECT_ID = c.OBJECT_ID WHERE c.name = #SearchColumn
OPEN #getResult
FETCH NEXT FROM #getResult INTO #TableName,#ColumnName
WHILE ##FETCH_STATUS = 0
BEGIN
SET NOCOUNT ON ;
DECLARE #RESULT INT;
DECLARE #TYPE INT
DECLARE #QUERY NVARCHAR(1000)
SET #QUERY = 'select #RESULT=count(*) from ' + ISNULL(#TableName,'') +' WHERE '+ ISNULL(#ColumnName,'')+'='''+ ISNULL(#ValueToSearch,'') +''''
EXEC sp_executesql #QUERY,
N'#result int OUTPUT, #type int OUTPUT',
#RESULT OUTPUT,
#TYPE OUTPUT
IF(ISNULL(#RESULT,0)>0)
BEGIN
SET NOCOUNT ON;
SELECT ' COLUMN '+ #ColumnName + ' OF TABLE ' +#TableName+ ' HAS THIS VALUE.'
END
FETCH NEXT FROM #getResult INTO #TableName,#ColumnName
END
CLOSE #getResult
DEALLOCATE #getResult
Thanks
Manoj